Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Nonparametric statistical structuring of knowledge systems using binary feature matches
DEFF Research Database (Denmark)
Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue
2014-01-01
statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Methodology in robust and nonparametric statistics
Jurecková, Jana; Picek, Jan
2012-01-01
Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators
Nonparametric statistics a step-by-step approach
Corder, Gregory W
2014-01-01
"…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory. It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca
Using Mathematica to build Non-parametric Statistical Tables
Directory of Open Access Journals (Sweden)
Gloria Perez Sainz de Rozas
2003-01-01
Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
1st Conference of the International Society for Nonparametric Statistics
Lahiri, S; Politis, Dimitris
2014-01-01
This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world. The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Examples of the Application of Nonparametric Information Geometry to Statistical Physics
Directory of Open Access Journals (Sweden)
Giovanni Pistone
2013-09-01
Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.
MIRELA SECARĂ
2008-01-01
Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...
Effects of dating errors on nonparametric trend analyses of speleothem time series
Directory of Open Access Journals (Sweden)
M. Mudelsee
2012-10-01
Full Text Available A fundamental problem in paleoclimatology is to take fully into account the various error sources when examining proxy records with quantitative methods of statistical time series analysis. Records from dated climate archives such as speleothems add extra uncertainty from the age determination to the other sources that consist in measurement and proxy errors. This paper examines three stalagmite time series of oxygen isotopic composition (δ^{18}O from two caves in western Germany, the series AH-1 from the Atta Cave and the series Bu1 and Bu4 from the Bunker Cave. These records carry regional information about past changes in winter precipitation and temperature. U/Th and radiocarbon dating reveals that they cover the later part of the Holocene, the past 8.6 thousand years (ka. We analyse centennial- to millennial-scale climate trends by means of nonparametric Gasser–Müller kernel regression. Error bands around fitted trend curves are determined by combining (1 block bootstrap resampling to preserve noise properties (shape, autocorrelation of the δ^{18}O residuals and (2 timescale simulations (models StalAge and iscam. The timescale error influences on centennial- to millennial-scale trend estimation are not excessively large. We find a "mid-Holocene climate double-swing", from warm to cold to warm winter conditions (6.5 ka to 6.0 ka to 5.1 ka, with warm–cold amplitudes of around 0.5‰ δ^{18}O; this finding is documented by all three records with high confidence. We also quantify the Medieval Warm Period (MWP, the Little Ice Age (LIA and the current warmth. Our analyses cannot unequivocally support the conclusion that current regional winter climate is warmer than that during the MWP.
Non-parametric Estimation approach in statistical investigation of nuclear spectra
Jafarizadeh, M A; Sabri, H; Maleki, B Rashidian
2011-01-01
In this paper, Kernel Density Estimation (KDE) as a non-parametric estimation method is used to investigate statistical properties of nuclear spectra. The deviation to regular or chaotic dynamics, is exhibited by closer distances to Poisson or Wigner limits respectively which evaluated by Kullback-Leibler Divergence (KLD) measure. Spectral statistics of different sequences prepared by nuclei corresponds to three dynamical symmetry limits of Interaction Boson Model(IBM), oblate and prolate nuclei and also the pairing effect on nuclear level statistics are analyzed (with pure experimental data). KD-based estimated density function, confirm previous predictions with minimum uncertainty (evaluated with Integrate Absolute Error (IAE)) in compare to Maximum Likelihood (ML)-based method. Also, the increasing of regularity degrees of spectra due to pairing effect is reveal.
Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka
2016-10-01
In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S
2005-10-01
In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.
The application of non-parametric statistical techniques to an ALARA programme.
Moon, J H; Cho, Y H; Kang, C S
2001-01-01
For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Zoffoli, Luca; Ditroilo, Massimiliano; Federici, Ario; Lucertini, Francesco
2017-09-09
This study used surface electromyography (EMG) to investigate the regions and patterns of activity of the external oblique (EO), erector spinae longissimus (ES), multifidus (MU) and rectus abdominis (RA) muscles during walking (W) and pole walking (PW) performed at different speeds and grades. Eighteen healthy adults undertook W and PW on a motorized treadmill at 60% and 100% of their walk-to-run preferred transition speed at 0% and 7% treadmill grade. The Teager-Kaiser energy operator was employed to improve the muscle activity detection and statistical non-parametric mapping based on paired t-tests was used to highlight statistical differences in the EMG patterns corresponding to different trials. The activation amplitude of all trunk muscles increased at high speed, while no differences were recorded at 7% treadmill grade. ES and MU appeared to support the upper body at the heel-strike during both W and PW, with the latter resulting in elevated recruitment of EO and RA as required to control for the longer stride and the push of the pole. Accordingly, the greater activity of the abdominal muscles and the comparable intervention of the spine extensors supports the use of poles by walkers seeking higher engagement of the lower trunk region. Copyright © 2017 Elsevier Ltd. All rights reserved.
COLOR IMAGE RETRIEVAL BASED ON NON-PARAMETRIC STATISTICAL TESTS OF HYPOTHESIS
Directory of Open Access Journals (Sweden)
R. Shekhar
2016-09-01
Full Text Available A novel method for color image retrieval, based on statistical non-parametric tests such as twosample Wald Test for equality of variance and Man-Whitney U test, is proposed in this paper. The proposed method tests the deviation, i.e. distance in terms of variance between the query and target images; if the images pass the test, then it is proceeded to test the spectrum of energy, i.e. distance between the mean values of the two images; otherwise, the test is dropped. If the query and target images pass the tests then it is inferred that the two images belong to the same class, i.e. both the images are same; otherwise, it is assumed that the images belong to different classes, i.e. both images are different. The proposed method is robust for scaling and rotation, since it adjusts itself and treats either the query image or the target image is the sample of other.
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test
Applications of non-parametric statistics and analysis of variance on sample variances
Myers, R. H.
1981-01-01
Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.
t-tests, non-parametric tests, and large studies—a paradox of statistical practice?
Directory of Open Access Journals (Sweden)
Fagerland Morten W
2012-06-01
Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.
SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit
Directory of Open Access Journals (Sweden)
Annie Chu
2009-04-01
Full Text Available The web-based, Java-written SOCR (Statistical Online Computational Resource toolshave been utilized in many undergraduate and graduate level statistics courses for sevenyears now (Dinov 2006; Dinov et al. 2008b. It has been proven that these resourcescan successfully improve students' learning (Dinov et al. 2008b. Being rst publishedonline in 2005, SOCR Analyses is a somewhat new component and it concentrate on datamodeling for both parametric and non-parametric data analyses with graphical modeldiagnostics. One of the main purposes of SOCR Analyses is to facilitate statistical learn-ing for high school and undergraduate students. As we have already implemented SOCRDistributions and Experiments, SOCR Analyses and Charts fulll the rest of a standardstatistics curricula. Currently, there are four core components of SOCR Analyses. Linearmodels included in SOCR Analyses are simple linear regression, multiple linear regression,one-way and two-way ANOVA. Tests for sample comparisons include t-test in the para-metric category. Some examples of SOCR Analyses' in the non-parametric category areWilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, Kolmogorov-Smirno testand Fligner-Killeen test. Hypothesis testing models include contingency table, Friedman'stest and Fisher's exact test. The last component of Analyses is a utility for computingsample sizes for normal distribution. In this article, we present the design framework,computational implementation and the utilization of SOCR Analyses.
Statistical analyses in disease surveillance systems.
Lescano, Andres G; Larasati, Ria Purwita; Sedyaningsih, Endang R; Bounlu, Khanthong; Araujo-Castillo, Roger V; Munayco-Escate, Cesar V; Soto, Giselle; Mundaca, C Cecilia; Blazes, David L
2008-11-14
The performance of disease surveillance systems is evaluated and monitored using a diverse set of statistical analyses throughout each stage of surveillance implementation. An overview of their main elements is presented, with a specific emphasis on syndromic surveillance directed to outbreak detection in resource-limited settings. Statistical analyses are proposed for three implementation stages: planning, early implementation, and consolidation. Data sources and collection procedures are described for each analysis.During the planning and pilot stages, we propose to estimate the average data collection, data entry and data distribution time. This information can be collected by surveillance systems themselves or through specially designed surveys. During the initial implementation stage, epidemiologists should study the completeness and timeliness of the reporting, and describe thoroughly the population surveyed and the epidemiology of the health events recorded. Additional data collection processes or external data streams are often necessary to assess reporting completeness and other indicators. Once data collection processes are operating in a timely and stable manner, analyses of surveillance data should expand to establish baseline rates and detect aberrations. External investigations can be used to evaluate whether abnormally increased case frequency corresponds to a true outbreak, and thereby establish the sensitivity and specificity of aberration detection algorithms.Statistical methods for disease surveillance have focused mainly on the performance of outbreak detection algorithms without sufficient attention to the data quality and representativeness, two factors that are especially important in developing countries. It is important to assess data quality at each state of implementation using a diverse mix of data sources and analytical methods. Careful, close monitoring of selected indicators is needed to evaluate whether systems are reaching their
Statistical analyses of a screen cylinder wake
Mohd Azmi, Azlin; Zhou, Tongming; Zhou, Yu; Cheng, Liang
2017-02-01
The evolution of a screen cylinder wake was studied by analysing its statistical properties over a streamwise range of x/d={10-60}. The screen cylinder was made of a stainless steel screen mesh of 67% porosity. The experiments were conducted in a wind tunnel at a Reynolds number of 7000 using an X-probe. The results were compared with those obtained in the wake generated by a solid cylinder. It was observed that the evolution of the statistics in the wake of the screen cylinder was different from that of a solid cylinder, reflecting the differences in the formation of the organized large-scale vortices in both wakes. The streamwise evolution of the Reynolds stresses, energy spectra and cross-correlation coefficients indicated that there exists a critical location that differentiates the screen cylinder wake into two regions over the measured streamwise range. The formation of the fully formed large-scale vortices was delayed until this critical location. Comparison with existing results for screen strips showed that although the near-wake characteristics and the vortex formation mechanism were similar between the two wake generators, variation in the Strouhal frequencies was observed and the self-preservation states were non-universal, reconfirming the dependence of a wake on its initial condition.
Directory of Open Access Journals (Sweden)
Paul H. Grawe
2016-01-01
Full Text Available Sydney Siegel and N. John Castellan, Jr. Nonparametric Statistics for the Behavioral Sciences, Second Edition (New York NY: McGraw Hill, 1988. 399 pp. ISBN: 9780070573574. Almost 60 years ago, Sidney Siegel wrote a stellar book helping anyone in academe to use nonparametric statistics, but ironically, 60 years after that achievement, American higher education confesses itself to be in the worst Quantitative Teaching Crisis of all time. The key clue to solving that crisis may be in Siegel and Castellan’s title, Nonparametric Statistics for the Behavioral Sciences, which quietly and perhaps unconsciously excludes the Humanities. Yet it is in humanistic realities that students read, write, and think. This book review considers what could be done if the Humanities were made aware of the enormous power of nonparametric statistics for advancing both their disciplines and their students’ ability to think quantitatively. A potentially revolutionary, humanistic, nonparametric finding is considered in detail along with a brief account of tens of humanistic discoveries deriving from Siegel and Castellan’s impetus.
Directory of Open Access Journals (Sweden)
Jun Zhang
2013-12-01
Full Text Available Divergence functions are the non-symmetric “distance” on the manifold, Μθ, of parametric probability density functions over a measure space, (Χ,μ. Classical information geometry prescribes, on Μθ: (i a Riemannian metric given by the Fisher information; (ii a pair of dual connections (giving rise to the family of α-connections that preserve the metric under parallel transport by their joint actions; and (iii a family of divergence functions ( α-divergence defined on Μθ x Μθ, which induce the metric and the dual connections. Here, we construct an extension of this differential geometric structure from Μθ (that of parametric probability density functions to the manifold, Μ, of non-parametric functions on X, removing the positivity and normalization constraints. The generalized Fisher information and α-connections on M are induced by an α-parameterized family of divergence functions, reflecting the fundamental convex inequality associated with any smooth and strictly convex function. The infinite-dimensional manifold, M, has zero curvature for all these α-connections; hence, the generally non-zero curvature of M can be interpreted as arising from an embedding of Μθ into Μ. Furthermore, when a parametric model (after a monotonic scaling forms an affine submanifold, its natural and expectation parameters form biorthogonal coordinates, and such a submanifold is dually flat for α = ± 1, generalizing the results of Amari’s α-embedding. The present analysis illuminates two different types of duality in information geometry, one concerning the referential status of a point (measurable function expressed in the divergence function (“referential duality” and the other concerning its representation under an arbitrary monotone scaling (“representational duality”.
Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems
Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.
2007-01-01
Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al
Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems
Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.
2007-01-01
Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al
Incorporating Nonparametric Statistics into Delphi Studies in Library and Information Science
Ju, Boryung; Jin, Tao
2013-01-01
Introduction: The Delphi technique is widely used in library and information science research. However, many researchers in the field fail to employ standard statistical tests when using this technique. This makes the technique vulnerable to criticisms of its reliability and validity. The general goal of this article is to explore how…
Non-parametric group-level statistics for source-resolved ERP analysis.
Lee, Clement; Miyakoshi, Makoto; Delorme, Arnaud; Cauwenberghs, Gert; Makeig, Scott
2015-01-01
We have developed a new statistical framework for group-level event-related potential (ERP) analysis in EEGLAB. The framework calculates the variance of scalp channel signals accounted for by the activity of homogeneous clusters of sources found by independent component analysis (ICA). When ICA data decomposition is performed on each subject's data separately, functionally equivalent ICs can be grouped into EEGLAB clusters. Here, we report a new addition (statPvaf) to the EEGLAB plug-in std_envtopo to enable inferential statistics on main effects and interactions in event related potentials (ERPs) of independent component (IC) processes at the group level. We demonstrate the use of the updated plug-in on simulated and actual EEG data.
A Java program for non-parametric statistic comparison of community structure
Directory of Open Access Journals (Sweden)
WenJun Zhang
2011-09-01
Full Text Available The Java algorithm to statistically compare structure difference of two communities was presented in this study. Euclidean distance, Manhattan distance, Pearson correlation, Point correlation, quadratic correlation and Jaccard coefficient were included in the algorithm. The algorithm was used to compare rice arthropod communities in Pearl River Delta, China, and the results showed that the family composition of arthropods for Guangzhou, Zhongshan, Zhuhai, and Dongguan are not significantly different.
Statistical analyses for NANOGrav 5-year timing residuals
Wang, Yan; Cordes, James M.; Jenet, Fredrick A.; Chatterjee, Shami; Demorest, Paul B.; Dolch, Timothy; Ellis, Justin A.; Lam, Michael T.; Madison, Dustin R.; McLaughlin, Maura A.; Perrodin, Delphine; Rankin, Joanna; Siemens, Xavier; Vallisneri, Michele
2017-02-01
In pulsar timing, timing residuals are the differences between the observed times of arrival and predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply optimal statistical methods for detecting weak gravitational wave signals, we need to know the statistical properties of noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data, which are obtained from Arecibo Observatory and Green Bank Telescope from 2005 to 2010. We find that most of the data are consistent with white noise; many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will mitigate the deviations.
Statistical Analyses for NANOGrav 5-year Timing Residuals
Wang, Y; Jenet, F A; Chatterjee, S; Demorest, P B; Dolch, T; Ellis, J A; Lam, M T; Madison, D R; McLaughlin, M; Perrodin, D; Rankin, J; Siemens, X; Vallisneri, M
2016-01-01
In pulsar timing, timing residuals are the differences between the observed times of arrival and the predictions from the timing model. A comprehensive timing model will produce featureless residuals, which are presumably composed of dominating noise and weak physical effects excluded from the timing model (e.g. gravitational waves). In order to apply the optimal statistical methods for detecting the weak gravitational wave signals, we need to know the statistical properties of the noise components in the residuals. In this paper we utilize a variety of non-parametric statistical tests to analyze the whiteness and Gaussianity of the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) 5-year timing data which are obtained from the Arecibo Observatory and the Green Bank Telescope from 2005 to 2010 (Demorest et al. 2013). We find that most of the data are consistent with white noise; Many data deviate from Gaussianity at different levels, nevertheless, removing outliers in some pulsars will m...
Applied statistics a handbook of BMDP analyses
Snell, E J
1987-01-01
This handbook is a realization of a long term goal of BMDP Statistical Software. As the software supporting statistical analysis has grown in breadth and depth to the point where it can serve many of the needs of accomplished statisticians it can also serve as an essential support to those needing to expand their knowledge of statistical applications. Statisticians should not be handicapped by heavy computation or by the lack of needed options. When Applied Statistics, Principle and Examples by Cox and Snell appeared we at BMDP were impressed with the scope of the applications discussed and felt that many statisticians eager to expand their capabilities in handling such problems could profit from having the solutions carried further, to get them started and guided to a more advanced level in problem solving. Who would be better to undertake that task than the authors of Applied Statistics? A year or two later discussions with David Cox and Joyce Snell at Imperial College indicated that a wedding of the proble...
DEFF Research Database (Denmark)
A methodology is presented that combines modelling based on first principles and data based modelling into a modelling cycle that facilitates fast decision-making based on statistical methods. A strong feature of this methodology is that given a first principles model along with process data, the......, the corresponding modelling cycle model of the given system for a given purpose. A computer-aided tool, which integrates the elements of the modelling cycle, is also presented, and an example is given of modelling a fed-batch bioreactor....
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most
Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R
2014-12-10
Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
José L. Valencia
2015-11-01
Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.
Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea
2016-01-01
In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
SWORDS: A statistical tool for analysing large DNA sequences
Indian Academy of Sciences (India)
Probal Chaudhuri; Sandip Das
2002-02-01
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance.
Pérez, Hector E; Kettner, Keith
2013-10-01
Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.
A weighted U-statistic for genetic association analyses of sequencing data.
Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing
2014-12-01
With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
Statistical analyses in the physiology of exercise and kinanthropometry.
Winter, E M; Eston, R G; Lamb, K L
2001-10-01
Research into the physiology of exercise and kinanthropometry is intended to improve our understanding of how the body responds and adapts to exercise. If such studies are to be meaningful, they have to be well designed and analysed. Advances in personal computing have made available statistical analyses that were previously the preserve of elaborate mainframe systems and have increased opportunities for investigation. However, the ease with which analyses can be performed can mask underlying philosophical and epistemological shortcomings. The aim of this review is to examine the use of four techniques that are especially relevant to physiological studies: (1) bivariate correlation and linear and non-linear regression, (2) multiple regression, (3) repeated-measures analysis of variance and (4) multi-level modelling. The importance of adhering to underlying statistical assumptions is emphasized and ways to accommodate violations of these assumptions are identified.
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Statistical technique for analysing functional connectivity of multiple spike trains.
Masud, Mohammad Shahed; Borisyuk, Roman
2011-03-15
A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains.
Heinrich, Lothar; Schmidt, Volker
2012-01-01
We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic \\chi^2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean-square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable \\beta-mixing condition on the underlying stationary marked point process which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. Our method needs a central limit theorem for \\beta-mixing random fields which is proved by extending Bernstein's blocking technique to non-cubic index sets and seems to be of interest in its own right. By large-scale model-based simulations the performance of our t...
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Statistical analyses support power law distributions found in neuronal avalanches.
Directory of Open Access Journals (Sweden)
Andreas Klaus
Full Text Available The size distribution of neuronal avalanches in cortical networks has been reported to follow a power law distribution with exponent close to -1.5, which is a reflection of long-range spatial correlations in spontaneous neuronal activity. However, identifying power law scaling in empirical data can be difficult and sometimes controversial. In the present study, we tested the power law hypothesis for neuronal avalanches by using more stringent statistical analyses. In particular, we performed the following steps: (i analysis of finite-size scaling to identify scale-free dynamics in neuronal avalanches, (ii model parameter estimation to determine the specific exponent of the power law, and (iii comparison of the power law to alternative model distributions. Consistent with critical state dynamics, avalanche size distributions exhibited robust scaling behavior in which the maximum avalanche size was limited only by the spatial extent of sampling ("finite size" effect. This scale-free dynamics suggests the power law as a model for the distribution of avalanche sizes. Using both the Kolmogorov-Smirnov statistic and a maximum likelihood approach, we found the slope to be close to -1.5, which is in line with previous reports. Finally, the power law model for neuronal avalanches was compared to the exponential and to various heavy-tail distributions based on the Kolmogorov-Smirnov distance and by using a log-likelihood ratio test. Both the power law distribution without and with exponential cut-off provided significantly better fits to the cluster size distributions in neuronal avalanches than the exponential, the lognormal and the gamma distribution. In summary, our findings strongly support the power law scaling in neuronal avalanches, providing further evidence for critical state dynamics in superficial layers of cortex.
Statistical analyses support power law distributions found in neuronal avalanches.
Klaus, Andreas; Yu, Shan; Plenz, Dietmar
2011-01-01
The size distribution of neuronal avalanches in cortical networks has been reported to follow a power law distribution with exponent close to -1.5, which is a reflection of long-range spatial correlations in spontaneous neuronal activity. However, identifying power law scaling in empirical data can be difficult and sometimes controversial. In the present study, we tested the power law hypothesis for neuronal avalanches by using more stringent statistical analyses. In particular, we performed the following steps: (i) analysis of finite-size scaling to identify scale-free dynamics in neuronal avalanches, (ii) model parameter estimation to determine the specific exponent of the power law, and (iii) comparison of the power law to alternative model distributions. Consistent with critical state dynamics, avalanche size distributions exhibited robust scaling behavior in which the maximum avalanche size was limited only by the spatial extent of sampling ("finite size" effect). This scale-free dynamics suggests the power law as a model for the distribution of avalanche sizes. Using both the Kolmogorov-Smirnov statistic and a maximum likelihood approach, we found the slope to be close to -1.5, which is in line with previous reports. Finally, the power law model for neuronal avalanches was compared to the exponential and to various heavy-tail distributions based on the Kolmogorov-Smirnov distance and by using a log-likelihood ratio test. Both the power law distribution without and with exponential cut-off provided significantly better fits to the cluster size distributions in neuronal avalanches than the exponential, the lognormal and the gamma distribution. In summary, our findings strongly support the power law scaling in neuronal avalanches, providing further evidence for critical state dynamics in superficial layers of cortex.
Jin, Qian; He, Li-Jun; Zhang, Ai-Bing
2012-01-01
In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.
A weighted U statistic for association analyses considering genetic heterogeneity.
Wei, Changshuai; Elston, Robert C; Lu, Qing
2016-07-20
Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd.
Non-Statistical Methods of Analysing of Bankruptcy Risk
Directory of Open Access Journals (Sweden)
Pisula Tomasz
2015-06-01
Full Text Available The article focuses on assessing the effectiveness of a non-statistical approach to bankruptcy modelling in enterprises operating in the logistics sector. In order to describe the issue more comprehensively, the aforementioned prediction of the possible negative results of business operations was carried out for companies functioning in the Polish region of Podkarpacie, and in Slovakia. The bankruptcy predictors selected for the assessment of companies operating in the logistics sector included 28 financial indicators characterizing these enterprises in terms of their financial standing and management effectiveness. The purpose of the study was to identify factors (models describing the bankruptcy risk in enterprises in the context of their forecasting effectiveness in a one-year and two-year time horizon. In order to assess their practical applicability the models were carefully analysed and validated. The usefulness of the models was assessed in terms of their classification properties, and the capacity to accurately identify enterprises at risk of bankruptcy and healthy companies as well as proper calibration of the models to the data from training sample sets.
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses
Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy
2015-01-01
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses.
Directory of Open Access Journals (Sweden)
Md Shamsuzzoha Bayzid
Full Text Available Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS, modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014 presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also
Institute of Scientific and Technical Information of China (English)
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
统计软件R在非参数统计教学中的应用%Application of Statistical Software R in the Teaching of Non-Parametric Statistics
Institute of Scientific and Technical Information of China (English)
王志刚; 冯利英; 刘勇
2012-01-01
Introduces the applieation of statistical software R in the teaching of non-parametric statistic's, which is an important branch of statistics. In particular, describes the using of software R in ex- ploratory data analysis, inferential statistics and stochastic, simulation in details. The flexihle, open-sourc, e characteristics of software R makes the data processing more efficient. This soft- ware can realize all the methods of the teaching process, and is convenient fi~r learners to opti- mize and improve based on the previous work. R software is suitable for teaching of the non- parametric statistics.%主要介绍统计软件R在统计中一个重要分支非参数统计中的应用．分别从探索性数据分析、推断统计、随机模拟三个角度介绍R软件的应用。从介绍可以看出R软件的灵活、开源的特性，使得数据处理变得更加高效、得心应手。能够通过软件实现教学环节中的所有方法，并且方便学习者在前人工作基础上对方法进行优化、改进，在非参数统计教学中选用R软件是适合的。
Feminism and Factoral Analyses: Alleviating Students' Statistics Anxieties.
Nielsen, Linda
1979-01-01
Describes female math underachievement and counseling or teaching techniques being implemented on college campuses to alleviate math anxiety. Students derive unique benefits from being taught research and statistics courses by female professors. To be effective, female professors must embody distinct feminist roles and perspectives. Specific…
Statistical analyses of hydrophobic interactions: A mini-review
Pratt, L R; Rempe, Susan B
2016-01-01
This review focuses on the striking recent progress in solving for hydrophobic interactions between small inert molecules. We discuss several new understandings. Firstly, the _inverse _temperature phenomenology of hydrophobic interactions, _i.e., strengthening of hydrophobic bonds with increasing temperature, is decisively exhibited by hydrophobic interactions between atomic-scale hard sphere solutes in water. Secondly, inclusion of attractive interactions associated with atomic-size hydrophobic reference cases leads to substantial, non-trivial corrections to reference results for purely repulsive solutes. Hydrophobic bonds are _weakened by adding solute dispersion forces to treatment of reference cases. The classic statistical mechanical theory for those corrections is not accurate in this application, but molecular quasi-chemical theory shows promise. Finally, because of the masking roles of excluded volume and attractive interactions, comparisons that do not discriminate the different possibilities face an...
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Energy Technology Data Exchange (ETDEWEB)
Udey, Ruth Norma [Michigan State Univ., East Lansing, MI (United States)
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
A Bayesian nonparametric method for prediction in EST analysis
Directory of Open Access Journals (Sweden)
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
RooStatsCms: a tool for analyses modelling, combination and statistical studies
Piparo, D.; Schott, G.; Quast, G.
2009-12-01
The RooStatsCms (RSC) software framework allows analysis modelling and combination, statistical studies together with the access to sophisticated graphics routines for results visualisation. The goal of the project is to complement the existing analyses by means of their combination and accurate statistical studies.
RooStatsCms: a tool for analyses modelling, combination and statistical studies
Piparo, D; Quast, Prof G
2008-01-01
The RooStatsCms (RSC) software framework allows analysis modelling and combination, statistical studies together with the access to sophisticated graphics routines for results visualisation. The goal of the project is to complement the existing analyses by means of their combination and accurate statistical studies.
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Energy Technology Data Exchange (ETDEWEB)
Pekney, Natalie J.; Cheng, Hanqi; Small, Mitchell J.
2015-11-05
Abstract: The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality.
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
2016-05-31
Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
A Censored Nonparametric Software Reliability Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Kanda, Junya
2016-01-01
The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.
Directory of Open Access Journals (Sweden)
Kühnast, Corinna
2008-04-01
Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.
DEFF Research Database (Denmark)
Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona;
2015-01-01
from one municipality was sorted at "Level III", e.g. detailed, while the two others were sorted only at "Level I"). The results showed that residual household waste mainly contained food waste (42 +/- 5%, mass per wet basis) and miscellaneous combustibles (18 +/- 3%, mass per wet basis). The residual...... household waste generation rate in the study areas was 3-4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three...
A new statistical method for design and analyses of component tolerance
Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam
2017-09-01
Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.
Directory of Open Access Journals (Sweden)
Alcides Cabrera Campos
2012-09-01
Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations
Energy Technology Data Exchange (ETDEWEB)
Kleijnen, J.P.C.; Helton, J.C.
1999-04-01
The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are considered for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.
Directory of Open Access Journals (Sweden)
Azuaje Francisco
2006-06-01
Full Text Available Abstract Background The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies. Methods We reviewed hundreds of MEDLINE-indexed papers involving gene expression data analysis, which were published between 2003 and 2005. These papers were examined on the basis of their reporting of several factors, such as sample size, statistical power and software availability. Results Among the examined papers, we concentrated on 293 papers consisting of applications and new methodologies. These papers did not report approaches to sample size and statistical power estimation. Explicit statements on data transformation and descriptions of the normalisation techniques applied prior to data analyses (e.g. classification were not reported in 57 (37.5% and 104 (68.4% of the methodology papers respectively. With regard to papers presenting biomedical-relevant applications, 41(29.1 % of these papers did not report on data normalisation and 83 (58.9% did not describe the normalisation technique applied. Clustering-based analysis, the t-test and ANOVA represent the most widely applied techniques in microarray data analysis. But remarkably, only 5 (3.5% of the application papers included statements or references to assumption about variance homogeneity for the application of the t-test and ANOVA. There is still a need to promote the reporting of software packages applied or their availability. Conclusion Recently-published gene expression data analysis studies may lack key information required for properly assessing their design quality and potential impact. There is a need for more rigorous reporting of important experimental
Nonparametric statistical testing of coherence differences
Maris, E.; Schoffelen, J.M.; Fries, P.
2007-01-01
Many important questions in neuroscience are about interactions between neurons or neuronal groups. These interactions are often quantified by coherence, which is a frequency-indexed measure that quantifies the extent to which two signals exhibit a consistent phase relation. In this paper, we consid
A review of statistical analyses on monthly and daily rainfall in Catalonia
Directory of Open Access Journals (Sweden)
X. Lana
2009-01-01
Full Text Available A review on recent studies about monthly and daily rainfall in Catalonia is presented. Monthly rainfall is analysed along the west Mediterranean Coast and in Catalonia, quantifying aspects as the irregularity of monthly amounts and the spatial distribution of the Standard Precipitation Index. Several statistics are applied to daily rainfall series such as their extreme value and intraannual spatial distributions, the variability of the average and standard deviation rain amounts for each month, their amount and time distributions, and time trends affecting four pluviometric indices for different percentiles and class intervals. All these different analyses constitute the continuity of the scientific study of Catalan rainfall, which started about a century ago.
The Statistical Analyses of the White-Light Flares: Two Main Results About Flare Behaviours
Dal, H A
2012-01-01
We present two main results, based on the models and the statistical analyses of 1672 U-band flares. We also discuss the behaviours of the white-light flares. In addition, the parameters of the flares detected from two years of observations on CR Dra are presented. By comparing with the flare parameters obtained from other UV Ceti type stars, we examine the behaviour of optical flare processes along the spectral types. Moreover, we aimed, using large white-light flare data,to analyse the flare time-scales in respect to some results obtained from the X-ray observations. Using the SPSS V17.0 and the GraphPad Prism V5.02 software, the flares detected from CR Dra were modelled with the OPEA function and analysed with t-Test method to compare similar flare events in other stars. In addition, using some regression calculations in order to derive the best histograms, the time-scales of the white-light flares were analysed. Firstly, CR Dra flares have revealed that the white-light flares behave in a similar way as th...
DEFF Research Database (Denmark)
Frandsen, Tove Faber; Nicolaisen, Jeppe
2017-01-01
Using statistical methods to analyse digital material for patterns makes it possible to detect patterns in big data that we would otherwise not be able to detect. This paper seeks to exemplify this fact by statistically analysing a large corpus of references in systematic reviews. The aim...
Comparisons of power of statistical methods for gene-environment interaction analyses.
Ege, Markus J; Strachan, David P
2013-10-01
Any genome-wide analysis is hampered by reduced statistical power due to multiple comparisons. This is particularly true for interaction analyses, which have lower statistical power than analyses of associations. To assess gene-environment interactions in population settings we have recently proposed a statistical method based on a modified two-step approach, where first genetic loci are selected by their associations with disease and environment, respectively, and subsequently tested for interactions. We have simulated various data sets resembling real world scenarios and compared single-step and two-step approaches with respect to true positive rate (TPR) in 486 scenarios and (study-wide) false positive rate (FPR) in 252 scenarios. Our simulations confirmed that in all two-step methods the two steps are not correlated. In terms of TPR, two-step approaches combining information on gene-disease association and gene-environment association in the first step were superior to all other methods, while preserving a low FPR in over 250 million simulations under the null hypothesis. Our weighted modification yielded the highest power across various degrees of gene-environment association in the controls. An optimal threshold for step 1 depended on the interacting allele frequency and the disease prevalence. In all scenarios, the least powerful method was to proceed directly to an unbiased full interaction model, applying conventional genome-wide significance thresholds. This simulation study confirms the practical advantage of two-step approaches to interaction testing over more conventional one-step designs, at least in the context of dichotomous disease outcomes and other parameters that might apply in real-world settings.
DEFF Research Database (Denmark)
Linnet, Kristian
2005-01-01
Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
Poulos, M. J.; Pierce, J. L.; McNamara, J. P.; Flores, A. N.; Benner, S. G.
2015-12-01
Terrain aspect alters the spatial distribution of insolation across topography, driving eco-pedo-hydro-geomorphic feedbacks that can alter landform evolution and result in valley asymmetries for a suite of land surface characteristics (e.g. slope length and steepness, vegetation, soil properties, and drainage development). Asymmetric valleys serve as natural laboratories for studying how landscapes respond to climate perturbation. In the semi-arid montane granodioritic terrain of the Idaho batholith, Northern Rocky Mountains, USA, prior works indicate that reduced insolation on northern (pole-facing) aspects prolongs snow pack persistence, and is associated with thicker, finer-grained soils, that retain more water, prolong the growing season, support coniferous forest rather than sagebrush steppe ecosystems, stabilize slopes at steeper angles, and produce sparser drainage networks. We hypothesize that the primary drivers of valley asymmetry development are changes in the pedon-scale water-balance that coalesce to alter catchment-scale runoff and drainage development, and ultimately cause the divide between north and south-facing land surfaces to migrate northward. We explore this conceptual framework by coupling land surface analyses with statistical modeling to assess relationships and the relative importance of land surface characteristics. Throughout the Idaho batholith, we systematically mapped and tabulated various statistical measures of landforms, land cover, and hydroclimate within discrete valley segments (n=~10,000). We developed a random forest based statistical model to predict valley slope asymmetry based upon numerous measures (n>300) of landscape asymmetries. Preliminary results suggest that drainages are tightly coupled with hillslopes throughout the region, with drainage-network slope being one of the strongest predictors of land-surface-averaged slope asymmetry. When slope-related statistics are excluded, due to possible autocorrelation, valley
Stück, H. L.; Siegesmund, S.
2012-04-01
Sandstones are a popular natural stone due to their wide occurrence and availability. The different applications for these stones have led to an increase in demand. From the viewpoint of conservation and the natural stone industry, an understanding of the material behaviour of this construction material is very important. Sandstones are a highly heterogeneous material. Based on statistical analyses with a sufficiently large dataset, a systematic approach to predicting the material behaviour should be possible. Since the literature already contains a large volume of data concerning the petrographical and petrophysical properties of sandstones, a large dataset could be compiled for the statistical analyses. The aim of this study is to develop constraints on the material behaviour and especially on the weathering behaviour of sandstones. Approximately 300 samples from historical and presently mined natural sandstones in Germany and ones described worldwide were included in the statistical approach. The mineralogical composition and fabric characteristics were determined from detailed thin section analyses and descriptions in the literature. Particular attention was paid to evaluating the compositional and textural maturity, grain contact respectively contact thickness, type of cement, degree of alteration and the intergranular volume. Statistical methods were used to test for normal distributions and calculating the linear regression of the basic petrophysical properties of density, porosity, water uptake as well as the strength. The sandstones were classified into three different pore size distributions and evaluated with the other petrophysical properties. Weathering behavior like hygric swelling and salt loading tests were also included. To identify similarities between individual sandstones or to define groups of specific sandstone types, principle component analysis, cluster analysis and factor analysis were applied. Our results show that composition and porosity
Statistical analyses to support guidelines for marine avian sampling. Final report
Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris
2012-01-01
distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.
Directory of Open Access Journals (Sweden)
Bosch-Capblanch Xavier
2011-05-01
Full Text Available Abstract Background Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error. Results Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A captures particular cases that could not be solved in an automated way in the search iterations; the second search (B is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1 a single best matching variable is found, (O2 more than one matching variable is found or (O3 not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration. Conclusion Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when
2011-01-01
Background Data requirements by governments, donors and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyse time trends and geographical patterns of demographic and health related indicators. However, since not all datasets have the same structure, variables definitions and codes, they have to be harmonised prior to submitting them to the statistical analyses. Manually searching, renaming and recoding variables are extremely tedious and prone to errors tasks, overall when the number of datasets and variables are large. This article presents an automated approach to harmonise variables names across several datasets, which optimises the search of variables, minimises manual inputs and reduces the risk of error. Results Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables the labels of which contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables which have been identified in other surveys, as an illustration. For each variable of interest, the outputs of these engines can be (O1) a single best matching variable is found, (O2) more than one matching variable is found or (O3) not matching variables are found. Output O2 is solved by user judgement. Examples using four variables are presented showing that the searches have a 100% sensitivity and specificity after a second iteration. Conclusion Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets. This is especially relevant when the numbers of datasets
Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
IMGT standardization for statistical analyses of T cell receptor junctions: the TRAV-TRAJ example.
Bleakley, Kevin; Giudicelli, Véronique; Wu, Yan; Lefranc, Marie-Paule; Biau, Gérard
2006-01-01
The diversity of immunoglobulin (IG) and T cell receptor (TR) chains depends on several mechanisms: combinatorial diversity, which is a consequence of the number of V, D and J genes and the N-REGION diversity, which creates an extensive and clonal somatic diversity at the V-J and V-D-J junctions. For the IG, the diversity is further increased by somatic hypermutations. The number of different junctions per chain and per individual is estimated to be 10(12). We have chosen the human TRAV-TRAJ junctions as an example in order to characterize the required criteria for a standardized analysis of the IG and TR V-J and V-D-J junctions, based on the IMGT-ONTOLOGY concepts, and to serve as a first IMGT junction reference set (IMGT, http://imgt.cines.fr). We performed a thorough statistical analysis of 212 human rearranged TRAV-TRAJ sequences, which were aligned and analysed by the integrated IMGT/V-QUEST software, which includes IMGT/JunctionAnalysis, then manually expert-verified. Furthermore, we compared these 212 sequences with 37 other human TRAV-TRAJ junction sequences for which some particularities (potential sequence polymorphisms, sequencing errors, etc.) did not allow IMGT/JunctionAnalysis to provide the correct biological results, according to expert verification. Using statistical learning, we constructed an automatic warning system to predict if new, automatically analysed TRAV-TRAJ sequences should be manually re-checked. We estimated the robustness of this automatic warning system.
Statistical contact angle analyses; "slow moving" drops on a horizontal silicon-oxide surface.
Schmitt, M; Grub, J; Heib, F
2015-06-01
Sessile drop experiments on horizontal surfaces are commonly used to characterise surface properties in science and in industry. The advancing angle and the receding angle are measurable on every solid. Specially on horizontal surfaces even the notions themselves are critically questioned by some authors. Building a standard, reproducible and valid method of measuring and defining specific (advancing/receding) contact angles is an important challenge of surface science. Recently we have developed two/three approaches, by sigmoid fitting, by independent and by dependent statistical analyses, which are practicable for the determination of specific angles/slopes if inclining the sample surface. These approaches lead to contact angle data which are independent on "user-skills" and subjectivity of the operator which is also of urgent need to evaluate dynamic measurements of contact angles. We will show in this contribution that the slightly modified procedures are also applicable to find specific angles for experiments on horizontal surfaces. As an example droplets on a flat freshly cleaned silicon-oxide surface (wafer) are dynamically measured by sessile drop technique while the volume of the liquid is increased/decreased. The triple points, the time, the contact angles during the advancing and the receding of the drop obtained by high-precision drop shape analysis are statistically analysed. As stated in the previous contribution the procedure is called "slow movement" analysis due to the small covered distance and the dominance of data points with low velocity. Even smallest variations in velocity such as the minimal advancing motion during the withdrawing of the liquid are identifiable which confirms the flatness and the chemical homogeneity of the sample surface and the high sensitivity of the presented approaches.
Computational AstroStatistics Fast and Efficient Tools for Analysing Huge Astronomical Data Sources
Nichol, R C; Connolly, A J; Davies, S; Genovese, C; Hopkins, A M; Miller, C J; Moore, A W; Pelleg, D; Richards, G T; Schneider, J; Szapudi, I; Wasserman, L H
2001-01-01
I present here a review of past and present multi-disciplinary research of the Pittsburgh Computational AstroStatistics (PiCA) group. This group is dedicated to developing fast and efficient statistical algorithms for analysing huge astronomical data sources. I begin with a short review of multi-resolutional kd-trees which are the building blocks for many of our algorithms. For example, quick range queries and fast n-point correlation functions. I will present new results from the use of Mixture Models (Connolly et al. 2000) in density estimation of multi-color data from the Sloan Digital Sky Survey (SDSS). Specifically, the selection of quasars and the automated identification of X-ray sources. I will also present a brief overview of the False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has been used in the detection of ``Baryon Wiggles'' in the local galaxy power spectrum and source identification in radio data. Finally, I will look forward to new research on an automated Bayes Netw...
Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard
2015-02-01
Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at "Level III", e.g. detailed, while the two others were sorted only at "Level I"). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3-4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single-family and multi-family house areas), the individual percentage composition of food waste, paper, and glass was significantly different between the housing types. This indicates that housing type is a critical stratification parameter. Separating food leftovers from food packaging during manual sorting of the sampled waste did not have significant influence on the proportions of food waste
Ofoghi, Bahadorreza; Zeleznikow, John; Dwyer, Dan; Macmahon, Clare
2013-01-01
This article describes the utilisation of an unsupervised machine learning technique and statistical approaches (e.g., the Kolmogorov-Smirnov test) that assist cycling experts in the crucial decision-making processes for athlete selection, training, and strategic planning in the track cycling Omnium. The Omnium is a multi-event competition that will be included in the summer Olympic Games for the first time in 2012. Presently, selectors and cycling coaches make decisions based on experience and intuition. They rarely have access to objective data. We analysed both the old five-event (first raced internationally in 2007) and new six-event (first raced internationally in 2011) Omniums and found that the addition of the elimination race component to the Omnium has, contrary to expectations, not favoured track endurance riders. We analysed the Omnium data and also determined the inter-relationships between different individual events as well as between those events and the final standings of riders. In further analysis, we found that there is no maximum ranking (poorest performance) in each individual event that riders can afford whilst still winning a medal. We also found the required times for riders to finish the timed components that are necessary for medal winning. The results of this study consider the scoring system of the Omnium and inform decision-making toward successful participation in future major Omnium competitions.
Schmitt, M; Groß, K; Grub, J; Heib, F
2015-06-01
Contact angle determination by sessile drop technique is essential to characterise surface properties in science and in industry. Different specific angles can be observed on every solid which are correlated with the advancing or the receding of the triple line. Different procedures and definitions for the determination of specific angles exist which are often not comprehensible or reproducible. Therefore one of the most important things in this area is to build standard, reproducible and valid methods for determining advancing/receding contact angles. This contribution introduces novel techniques to analyse dynamic contact angle measurements (sessile drop) in detail which are applicable for axisymmetric and non-axisymmetric drops. Not only the recently presented fit solution by sigmoid function and the independent analysis of the different parameters (inclination, contact angle, velocity of the triple point) but also the dependent analysis will be firstly explained in detail. These approaches lead to contact angle data and different access on specific contact angles which are independent from "user-skills" and subjectivity of the operator. As example the motion behaviour of droplets on flat silicon-oxide surfaces after different surface treatments is dynamically measured by sessile drop technique when inclining the sample plate. The triple points, the inclination angles, the downhill (advancing motion) and the uphill angles (receding motion) obtained by high-precision drop shape analysis are independently and dependently statistically analysed. Due to the small covered distance for the dependent analysis (static to the "slow moving" dynamic contact angle determination. They are characterised by small deviations of the computed values. Additional to the detailed introduction of this novel analytical approaches plus fit solution special motion relations for the drop on inclined surfaces and detailed relations about the reactivity of the freshly cleaned silicon wafer
Energy Technology Data Exchange (ETDEWEB)
Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)
2015-02-15
Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single
AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa
Directory of Open Access Journals (Sweden)
Meier-Kolthoff Jan
2007-10-01
Full Text Available Abstract Background Current tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated which are highly optimized versions of Parafit and DistPCoA respectively. Results Both programs have been entirely re-written in C. Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5–61 times faster than Parafit with a lower memory footprint (up to 35% reduction while the performance benefit increases with growing dataset size. The MPI-based parallel implementation of AxParafit shows good scalability on up to 128 processors, even on medium-sized datasets. The parallel analysis with AxParafit on 128 CPUs for a medium-sized dataset with an 512 by 512 association matrix is more than 1,200/128 times faster per processor than the sequential Parafit run. AxPcoords is 8–26 times faster than DistPCoA and numerically stable on large datasets. We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants. To the best of our knowledge, this study represents the largest co-phylogenetic analysis to date. Conclusion The highly efficient AxPcoords and AxParafit programs allow for large-scale co-phylogenetic analyses on several thousands of taxa for the first time. In addition, AxParafit and AxPcoords have been integrated into the easy-to-use CopyCat tool.
Chan, Desmond Wing-Sum
An S-matrix formalism of the statistical theory of nuclear reactions has been developed by Weidenmuller et al., based upon the Engelbrecht-Weidenmuller transformation and extended to cases where direct reactions are present as a means of deriving expressions for the fluctuation cross section going beyond the framework of conventional Hauser-Feshbach theory. This unified approach, from which a coherent sum of fluctuation and direct-interaction cross sections is combined to yield a net reaction cross section, provides a means of deriving a comprehensive and accurate theoretical description of the scattering process. Although a framework for the formal theory has been constructed, it had not previously been applied to the qualitative analyses of scattering data. As described in this thesis, a computer program "NANCY" has been compiled by modifying Tamura's coupled -channels code "JUPITOR-1" (through modifications suggested by Moldauer) and incorporating Smith's optical model routine "SCAT", as a means of generating the entire symmetric S -matrix. Using this program, computations were undertaken to determine numerically the energy-averaged cross sections for inelastic neutron scattering on ('232)Th and ('238)U from threshold to several MeV. With appropriate variation of coupling strengths between the ground state rotational band and vibrational levels good fits to the experimental data were attained, which compared favorably with theoretical results generated from conventional approaches.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Zhang, Yang; Máté, Gabriell; Müller, Patrick; Hillebrandt, Sabina; Krufczik, Matthias; Bach, Margund; Kaufmann, Rainer; Hausmann, Michael; Heermann, Dieter W
2015-01-01
It has been well established that the architecture of chromatin in cell nuclei is not random but functionally correlated. Chromatin damage caused by ionizing radiation raises complex repair machineries. This is accompanied by local chromatin rearrangements and structural changes which may for instance improve the accessibility of damaged sites for repair protein complexes. Using stably transfected HeLa cells expressing either green fluorescent protein (GFP) labelled histone H2B or yellow fluorescent protein (YFP) labelled histone H2A, we investigated the positioning of individual histone proteins in cell nuclei by means of high resolution localization microscopy (Spectral Position Determination Microscopy = SPDM). The cells were exposed to ionizing radiation of different doses and aliquots were fixed after different repair times for SPDM imaging. In addition to the repair dependent histone protein pattern, the positioning of antibodies specific for heterochromatin and euchromatin was separately recorded by SPDM. The present paper aims to provide a quantitative description of structural changes of chromatin after irradiation and during repair. It introduces a novel approach to analyse SPDM images by means of statistical physics and graph theory. The method is based on the calculation of the radial distribution functions as well as edge length distributions for graphs defined by a triangulation of the marker positions. The obtained results show that through the cell nucleus the different chromatin re-arrangements as detected by the fluorescent nucleosomal pattern average themselves. In contrast heterochromatic regions alone indicate a relaxation after radiation exposure and re-condensation during repair whereas euchromatin seemed to be unaffected or behave contrarily. SPDM in combination with the analysis techniques applied allows the systematic elucidation of chromatin re-arrangements after irradiation and during repair, if selected sub-regions of nuclei are
Directory of Open Access Journals (Sweden)
Jishan Chen
2015-12-01
Full Text Available Due to a boom in the dairy industry in Northeast China, the hay industry has been developing rapidly. Thus, it is very important to evaluate the hay quality with a rapid and accurate method. In this research, a novel technique that combines near infrared spectroscopy (NIRs with three different statistical analyses (MLR, PCR and PLS was used to predict the chemical quality of sheepgrass (Leymus chinensis in Heilongjiang Province, China including the concentrations of crude protein (CP, acid detergent fiber (ADF, and neutral detergent fiber (NDF. Firstly, the linear partial least squares regression (PLS was performed on the spectra and the predictions were compared to those with laboratory-based recorded spectra. Then, the MLR evaluation method for CP has a potential to be used for industry requirements, as it needs less sophisticated and cheaper instrumentation using only a few wavelengths. Results show that in terms of CP, ADF and NDF, (i the prediction accuracy in terms of CP, ADF and NDF using PLS was obviously improved compared to the PCR algorithm, and comparable or even better than results generated using the MLR algorithm; (ii the predictions were worse compared to laboratory-based spectra with the MLR algorithmin, and poor predictions were obtained (R2, 0.62, RPD, 0.9 using MLR in terms of NDF; (iii a satisfactory accuracy with R2 and RPD by PLS method of 0.91, 3.2 for CP, 0.89, 3.1 for ADF and 0.88, 3.0 for NDF, respectively, was obtained. Our results highlight the use of the combined NIRs-PLS method could be applied as a valuable technique to rapidly and accurately evaluate the quality of sheepgrass hay.
Papageorgiou, Spyridon N; Papadopoulos, Moschos A; Athanasiou, Athanasios E
2014-02-01
Ideally meta-analyses (MAs) should consolidate the characteristics of orthodontic research in order to produce an evidence-based answer. However severe flaws are frequently observed in most of them. The aim of this study was to evaluate the statistical methods, the methodology, and the quality characteristics of orthodontic MAs and to assess their reporting quality during the last years. Electronic databases were searched for MAs (with or without a proper systematic review) in the field of orthodontics, indexed up to 2011. The AMSTAR tool was used for quality assessment of the included articles. Data were analyzed with Student's t-test, one-way ANOVA, and generalized linear modelling. Risk ratios with 95% confidence intervals were calculated to represent changes during the years in reporting of key items associated with quality. A total of 80 MAs with 1086 primary studies were included in this evaluation. Using the AMSTAR tool, 25 (27.3%) of the MAs were found to be of low quality, 37 (46.3%) of medium quality, and 18 (22.5%) of high quality. Specific characteristics like explicit protocol definition, extensive searches, and quality assessment of included trials were associated with a higher AMSTAR score. Model selection and dealing with heterogeneity or publication bias were often problematic in the identified reviews. The number of published orthodontic MAs is constantly increasing, while their overall quality is considered to range from low to medium. Although the number of MAs of medium and high level seems lately to rise, several other aspects need improvement to increase their overall quality.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Energy Technology Data Exchange (ETDEWEB)
Amidan, Brett G.; Pulsipher, Brent A.; Matzke, Brett D.
2009-12-17
number of zeros. Using QQ plots these data characteristics show a lack of normality from the data after contamination. Normality is improved when looking at log(CFU/cm2). Variance component analysis (VCA) and analysis of variance (ANOVA) were used to estimate the amount of variance due to each source and to determine which sources of variability were statistically significant. In general, the sampling methods interacted with the across event variability and with the across room variability. For this reason, it was decided to do analyses for each sampling method, individually. The between event variability and between room variability were significant for each method, except for the between event variability for the swabs. For both the wipes and vacuums, the within room standard deviation was much larger (26.9 for wipes and 7.086 for vacuums) than the between event standard deviation (6.552 for wipes and 1.348 for vacuums) and the between room standard deviation (6.783 for wipes and 1.040 for vacuums). Swabs between room standard deviation was 0.151, while both the within room and between event standard deviations are less than 0.10 (all measurements in CFU/cm2).
Semi- and Nonparametric ARCH Processes
Directory of Open Access Journals (Sweden)
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
Essentials of Excel, Excel VBA, SAS and Minitab for statistical and financial analyses
Lee, Cheng-Few; Chang, Jow-Ran; Tai, Tzu
2016-01-01
This introductory textbook for business statistics teaches statistical analysis and research methods via business case studies and financial data using Excel, MINITAB, and SAS. Every chapter in this textbook engages the reader with data of individual stock, stock indices, options, and futures. One studies and uses statistics to learn how to study, analyze, and understand a data set of particular interest. Some of the more popular statistical programs that have been developed to use statistical and computational methods to analyze data sets are SAS, SPSS, and MINITAB. Of those, we look at MINITAB and SAS in this textbook. One of the main reasons to use MINITAB is that it is the easiest to use among the popular statistical programs. We look at SAS because it is the leading statistical package used in industry. We also utilize the much less costly and ubiquitous Microsoft Excel to do statistical analysis, as the benefits of Excel have become widely recognized in the academic world and its analytical capabilities...
Energy Technology Data Exchange (ETDEWEB)
NONE
2011-07-01
This report is an assessment of the current model and presentation form of bio energy statistics. It appears proposed revision and enhancement of both collection and data representation. In the context of market development both in general for energy and particularly for bio energy and government targets, a good bio energy statistics form the basis to follow up the objectives and means.(eb)
Nonparametric estimation of ultrasound pulses
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders
DEFF Research Database (Denmark)
Nielsen, Morten Ørregaard
2009-01-01
In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...
Statistical issues in quality control of proteomic analyses: good experimental design and planning.
Cairns, David A
2011-03-01
Quality control is becoming increasingly important in proteomic investigations as experiments become more multivariate and quantitative. Quality control applies to all stages of an investigation and statistics can play a key role. In this review, the role of statistical ideas in the design and planning of an investigation is described. This involves the design of unbiased experiments using key concepts from statistical experimental design, the understanding of the biological and analytical variation in a system using variance components analysis and the determination of a required sample size to perform a statistically powerful investigation. These concepts are described through simple examples and an example data set from a 2-D DIGE pilot experiment. Each of these concepts can prove useful in producing better and more reproducible data. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Antczak, K; Wilczyńska, U
1980-01-01
Part II presents a statistical model devised by the authors for evaluating toxicological analyses results. The model includes: 1. Establishment of a reference value, basing on our own measurements taken by two independent analytical methods. 2. Selection of laboratories -- basing on the deviation of the obtained values from reference ones. 3. On consideration of variance analysis, t-student's test and differences test, subsequent quality controls and particular laboratories have been evaluated.
Pestman, Wiebe R
2009-01-01
This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.
Statistical Modelling of Synaptic Vesicles Distribution and Analysing their Physical Characteristics
DEFF Research Database (Denmark)
Khanmohammadi, Mahdieh
This Ph.D. thesis deals with mathematical and statistical modeling of synaptic vesicle distribution, shape, orientation and interactions. The first major part of this thesis treats the problem of determining the effect of stress on synaptic vesicle distribution and interactions. Serial section...... on differences of statistical measures in section and the same measures in between sections. Three-dimensional (3D) datasets are reconstructed by using image registration techniques and estimated thicknesses. We distinguish the effect of stress by estimating the synaptic vesicle densities and modeling......, which leads to more accurate results. Finally, we present a thorough statistical investigation of the shape, orientation and interactions of the synaptic vesicles during active time of the synapse. Focused ion beam-scanning electron microscopy images of a male mammalian brain are used for this study...
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
A non-parametric model for the cosmic velocity field
Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S
1999-01-01
We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.
Zhou, Yan; Dendukuri, Nandini
2014-07-20
Heterogeneity in diagnostic meta-analyses is common because of the observational nature of diagnostic studies and the lack of standardization in the positivity criterion (cut-off value) for some tests. So far the unexplained heterogeneity across studies has been quantified by either using the I(2) statistic for a single parameter (i.e. either the sensitivity or the specificity) or visually examining the data in a receiver-operating characteristic space. In this paper, we derive improved I(2) statistics measuring heterogeneity for dichotomous outcomes, with a focus on diagnostic tests. We show that the currently used estimate of the 'typical' within-study variance proposed by Higgins and Thompson is not able to properly account for the variability of the within-study variance across studies for dichotomous variables. Therefore, when the between-study variance is large, the 'typical' within-study variance underestimates the expected within-study variance, and the corresponding I(2) is overestimated. We propose to use the expected value of the within-study variation in the construction of I(2) in cases of univariate and bivariate diagnostic meta-analyses. For bivariate diagnostic meta-analyses, we derive a bivariate version of I(2) that is able to account for the correlation between sensitivity and specificity. We illustrate the performance of these new estimators using simulated data as well as two real data sets.
Energy Technology Data Exchange (ETDEWEB)
Helton, J.C.; Kleijnen, J.P.C.
1999-03-24
Procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses are described and illustrated. These procedures attempt to detect increasingly complex patterns in scatterplots and involve the identification of (i) linear relationships with correlation coefficients, (ii) monotonic relationships with rank correlation coefficients, (iii) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (iv) trends in variability as defined by variances and interquartile ranges, and (v) deviations from randomness as defined by the chi-square statistic. A sequence of example analyses with a large model for two-phase fluid flow illustrates how the individual procedures can differ in the variables that they identify as having effects on particular model outcomes. The example analyses indicate that the use of a sequence of procedures is a good analysis strategy and provides some assurance that an important effect is not overlooked.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
The Effects of Two Types of Sampling Error on Common Statistical Analyses.
Arnold, Margery E.
Sampling error refers to variability that is unique to the sample. If the sample is the entire population, then there is no sampling error. A related point is that sampling error is a function of sample size, as a hypothetical example illustrates. As the sample statistics more and more closely approximate the population parameters, the sampling…
Directory of Open Access Journals (Sweden)
Maria João Nunes
2005-03-01
Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.
Nonparametric Econometrics: The np Package
Directory of Open Access Journals (Sweden)
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
Priors, Posterior Odds and Lagrange Multiplier Statistics in Bayesian Analyses of Cointegration
F.R. Kleibergen (Frank); R. Paap (Richard)
1996-01-01
textabstractUsing the standard linear model as a base, a unified theory of Bayesian Analyses of Cointegration Models is constructed. This is achieved by defining (natural conjugate) priors in the linear model and using the implied priors for the cointegration model. Using these priors, posterior res
Thompson, Bruce; Snyder, Patricia A.
1998-01-01
Investigates two aspects of research analyses in quantitative research studies reported in the 1996 issues of "Journal of Counseling & Development" (JCD). Acceptable methodological practice regarding significance testing and evaluation of score reliability has evolved considerably. Contemporary thinking on these issues is described; practice as…
Waldman, Irwin D; Lilienfeld, Scott O
2016-03-01
We comment on Sijtsma's (2014) thought-provoking essay on how to minimize questionable research practices (QRPs) in psychology. We agree with Sijtsma that proactive measures to decrease the risk of QRPs will ultimately be more productive than efforts to target individual researchers and their work. In particular, we concur that encouraging researchers to make their data and research materials public is the best institutional antidote against QRPs, although we are concerned that Sijtsma's proposal to delegate more responsibility to statistical and methodological consultants could inadvertently reinforce the dichotomy between the substantive and statistical aspects of research. We also discuss sources of false-positive findings and replication failures in psychological research, and outline potential remedies for these problems. We conclude that replicability is the best metric of the minimization of QRPs and their adverse effects on psychological research.
Time Series Expression Analyses Using RNA-seq: A Statistical Approach
Directory of Open Access Journals (Sweden)
Sunghee Oh
2013-01-01
Full Text Available RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI, autoregressive time-lagged regression (AR(1, and hidden Markov model (HMM approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
Directory of Open Access Journals (Sweden)
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
Nonparametric test for detecting change in distribution with panel data
Pommeret, Denys; Ghattas, Badih
2011-01-01
This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.
Statistical analyses of the performance of Macedonian investment and pension funds
Directory of Open Access Journals (Sweden)
Petar Taleski
2015-10-01
Full Text Available The foundation of the post-modern portfolio theory is creating a portfolio based on a desired target return. This specifically applies to the performance of investment and pension funds that provide a rate of return meeting payment requirements from investment funds. A desired target return is the goal of an investment or pension fund. It is the primary benchmark used to measure performances, dynamic monitoring and evaluation of the risk–return ratio on investment funds. The analysis in this paper is based on monthly returns of Macedonian investment and pension funds (June 2011 - June 2014. Such analysis utilizes the basic, but highly informative statistical characteristic moments like skewness, kurtosis, Jarque–Bera, and Chebyishev’s Inequality. The objective of this study is to perform a trough analysis, utilizing the above mentioned and other types of statistical techniques (Sharpe, Sortino, omega, upside potential, Calmar, Sterling to draw relevant conclusions regarding the risks and characteristic moments in Macedonian investment and pension funds. Pension funds are the second largest segment of the financial system, and has great potential for further growth due to constant inflows from pension insurance. The importance of investment funds for the financial system in the Republic of Macedonia is still small, although open-end investment funds have been the fastest growing segment of the financial system. Statistical analysis has shown that pension funds have delivered a significantly positive volatility-adjusted risk premium in the analyzed period more so than investment funds.
Hayslett, H T
1991-01-01
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
McKean, J.; Roering, J.
High-resolution laser altimetry can depict the topography of large landslides with un- precedented accuracy and allow better management of the hazards posed by such slides. The surface of most landslides is rougher, on a local scale of a few meters, than adjacent unfailed slopes. This characteristic can be exploited to automatically detect and map landslides in landscapes represented by high resolution DTMs. We have used laser altimetry measurements of local topographic roughness to identify and map the perimeter and internal features of a large earthflow in the South Island, New Zealand. Surface roughness was first quantified by statistically characterizing the local variabil- ity of ground surface orientations using both circular and spherical statistics. These measures included the circular resultant, standard deviation and dispersion, and the three-dimensional spherical resultant and ratios of the normalized eigenvalues of the direction cosines. The circular measures evaluate the amount of change in topographic aspect from pixel-to-pixel in the gridded data matrix. The spherical statistics assess both the aspect and steepness of each pixel. The standard deviation of the third di- rection cosine was also used alone to define the variability in just the steepness of each pixel. All of the statistical measures detect and clearly map the earthflow. Cir- cular statistics also emphasize small folds transverse to the movement in the most active zone of the slide. The spherical measures are more sensitive to the larger scale roughness in a portion of the slide that includes large intact limestone blocks. Power spectra of surface roughness were also calculated from two-dimensional Fourier transformations in local test areas. A small earthflow had a broad spectral peak at wavelengths between 10 and 30 meters. Shallower soil failures and surface erosion produced surfaces with a very sharp spectral peak at 12 meters wavelength. Unfailed slopes had an order of magnitude
The Use of Statistical Process Control Tools for Analysing Financial Statements
Directory of Open Access Journals (Sweden)
Niezgoda Janusz
2017-06-01
Full Text Available This article presents the proposed application of one type of the modified Shewhart control charts in the monitoring of changes in the aggregated level of financial ratios. The control chart x̅ has been used as a basis of analysis. The examined variable from the sample in the mentioned chart is the arithmetic mean. The author proposes to substitute it with a synthetic measure that is determined and based on the selected ratios. As the ratios mentioned above, are expressed in different units and characters, the author applies standardisation. The results of selected comparative analyses have been presented for both bankrupts and non-bankrupts. They indicate the possibility of using control charts as an auxiliary tool in financial analyses.
Calibration of back-analysed model parameters for landslides using classification statistics
Cepeda, Jose; Henderson, Laura
2016-04-01
Back-analyses are useful for characterizing the geomorphological and mechanical processes and parameters involved in the initiation and propagation of landslides. These processes and parameters can in turn be used for improving forecasts of scenarios and hazard assessments in areas or sites which have similar settings to the back-analysed cases. The selection of the modeled landslide that produces the best agreement with the actual observations requires running a number of simulations by varying the type of model and the sets of input parameters. The comparison of the simulated and observed parameters is normally performed by visual comparison of geomorphological or dynamic variables (e.g., geometry of scarp and final deposit, maximum velocities and depths). Over the past six years, a method developed by NGI has been used by some researchers for a more objective selection of back-analysed input model parameters. That method includes an adaptation of the equations for calculation of classifiers, and a comparative evaluation of classifiers of the selected parameter sets in the Receiver Operating Characteristic (ROC) space. This contribution presents an updating of the methodology. The proposed procedure allows comparisons between two or more "clouds" of classifiers. Each cloud represents the performance of a model over a range of input parameters (e.g., samples of probability distributions). Considering the fact that each cloud does not necessarily produce a full ROC curve, two new normalised ROC-space parameters are introduced for characterizing the performance of each cloud. The first parameter is representative of the cloud position relative to the point of perfect classification. The second parameter characterizes the position of the cloud relative to the theoretically perfect ROC curve and the no-discrimination line. The methodology is illustrated with back-analyses of slope stability and landslide runout of selected case studies. This research activity has been
An Embedded Statistical Method for Coupling Molecular Dynamics and Finite Element Analyses
Saether, E.; Glaessgen, E.H.; Yamakov, V.
2008-01-01
The coupling of molecular dynamics (MD) simulations with finite element methods (FEM) yields computationally efficient models that link fundamental material processes at the atomistic level with continuum field responses at higher length scales. The theoretical challenge involves developing a seamless connection along an interface between two inherently different simulation frameworks. Various specialized methods have been developed to solve particular classes of problems. Many of these methods link the kinematics of individual MD atoms with FEM nodes at their common interface, necessarily requiring that the finite element mesh be refined to atomic resolution. Some of these coupling approaches also require simulations to be carried out at 0 K and restrict modeling to two-dimensional material domains due to difficulties in simulating full three-dimensional material processes. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the standard boundary value problem used to define a coupled domain. The method replaces a direct linkage of individual MD atoms and finite element (FE) nodes with a statistical averaging of atomistic displacements in local atomic volumes associated with each FE node in an interface region. The FEM and MD computational systems are effectively independent and communicate only through an iterative update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM). ESCM provides an enhanced coupling methodology that is inherently applicable to three-dimensional domains, avoids discretization of the continuum model to atomic scale resolution, and permits finite temperature states to be applied.
On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses.
Koehler, Elizabeth; Brown, Elizabeth; Haneuse, Sebastien J-P A
2009-05-01
Statistical experiments, more commonly referred to as Monte Carlo or simulation studies, are used to study the behavior of statistical methods and measures under controlled situations. Whereas recent computing and methodological advances have permitted increased efficiency in the simulation process, known as variance reduction, such experiments remain limited by their finite nature and hence are subject to uncertainty; when a simulation is run more than once, different results are obtained. However, virtually no emphasis has been placed on reporting the uncertainty, referred to here as Monte Carlo error, associated with simulation results in the published literature, or on justifying the number of replications used. These deserve broader consideration. Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required to achieve a desired level of accuracy. The issues and methods are demonstrated with two simple examples, one evaluating operating characteristics of the maximum likelihood estimator for the parameters in logistic regression and the other in the context of using the bootstrap to obtain 95% confidence intervals. The results suggest that in many settings, Monte Carlo error may be more substantial than traditionally thought.
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Nonparametric estimation of employee stock options
Institute of Scientific and Technical Information of China (English)
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
Basic statistical analyses of candidate nickel-hydrogen cells for the Space Station Freedom
Maloney, Thomas M.; Frate, David T.
1993-01-01
Nickel-Hydrogen (Ni/H2) secondary batteries will be implemented as a power source for the Space Station Freedom as well as for other NASA missions. Consequently, characterization tests of Ni/H2 cells from Eagle-Picher, Whittaker-Yardney, and Hughes were completed at the NASA Lewis Research Center. Watt-hour efficiencies of each Ni/H2 cell were measured for regulated charge and discharge cycles as a function of temperature, charge rate, discharge rate, and state of charge. Temperatures ranged from -5 C to 30 C, charge rates ranged from C/10 to 1C, discharge rates ranged from C/10 to 2C, and states of charge ranged from 20 percent to 100 percent. Results from regression analyses and analyses of mean watt-hour efficiencies demonstrated that overall performance was best at temperatures between 10 C and 20 C while the discharge rate correlated most strongly with watt-hour efficiency. In general, the cell with back-to-back electrode arrangement, single stack, 26 percent KOH, and serrated zircar separator and the cell with a recirculating electrode arrangement, unit stack, 31 percent KOH, zircar separators performed best.
Basic statistical analyses of candidate nickel-hydrogen cells for the Space Station Freedom
Energy Technology Data Exchange (ETDEWEB)
Maloney, T.M.; Frate, D.T.
1993-05-01
Nickel-Hydrogen (Ni/H2) secondary batteries will be implemented as a power source for the Space Station Freedom as well as for other NASA missions. Consequently, characterization tests of Ni/H2 cells from Eagle-Picher, Whittaker-Yardney, and Hughes were completed at the NASA Lewis Research Center. Watt-hour efficiencies of each Ni/H2 cell were measured for regulated charge and discharge cycles as a function of temperature, charge rate, discharge rate, and state of charge. Temperatures ranged from -5 C to 30 C, charge rates ranged from C/10 to 1C, discharge rates ranged from C/10 to 2C, and states of charge ranged from 20 percent to 100 percent. Results from regression analyses and analyses of mean watt-hour efficiencies demonstrated that overall performance was best at temperatures between 10 C and 20 C while the discharge rate correlated most strongly with watt-hour efficiency. In general, the cell with back-to-back electrode arrangement, single stack, 26 percent KOH, and serrated zircar separator and the cell with a recirculating electrode arrangement, unit stack, 31 percent KOH, zircar separators performed best.
DEFF Research Database (Denmark)
Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona
2015-01-01
Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...
Directory of Open Access Journals (Sweden)
Mohammad D. AL-Tahat
2012-01-01
Full Text Available This paper provides a review and introduction on agile manufacturing. Tactics of agile manufacturing are mapped into different production areas (eight-construct latent: manufacturing equipment and technology, processes technology and know-how, quality and productivity improvement, production planning and control, shop floor management, product design and development, supplier relationship management, and customer relationship management. The implementation level of agile manufacturing tactics is investigated in each area. A structural equation model is proposed. Hypotheses are formulated. Feedback from 456 firms is collected using five-point-Likert-scale questionnaire. Statistical analysis is carried out using IBM SPSS and AMOS. Multicollinearity, content validity, consistency, construct validity, ANOVA analysis, and relationships between agile components are tested. The results of this study prove that the agile manufacturing tactics have positive effect on the overall agility level. This conclusion can be used by manufacturing firms to manage challenges when trying to be agile.
Statistical Modelling of Synaptic Vesicles Distribution and Analysing their Physical Characteristics
DEFF Research Database (Denmark)
Khanmohammadi, Mahdieh
transmission electron microscopy is used to acquire images from two experimental groups of rats: 1) rats subjected to a behavioral model of stress and 2) rats subjected to sham stress as the control group. The synaptic vesicle distribution and interactions are modeled by employing a point process approach......This Ph.D. thesis deals with mathematical and statistical modeling of synaptic vesicle distribution, shape, orientation and interactions. The first major part of this thesis treats the problem of determining the effect of stress on synaptic vesicle distribution and interactions. Serial section....... The model is able to correctly separate the two experimental groups. Two different approaches to estimate the thickness of each section of specimen being imaged are introduced. The first approach uses Darboux frame and Cartan matrix to measure the isophote curvature and the second approach is based...
Quantum, classical and semiclassical analyses of photon statistics in harmonic generation
Bajer, J; Bajer, Jiri; Miranowicz, Adam
2001-01-01
In this review, we compare different descriptions of photon-number statistics in harmonic generation processes within quantum, classical and semiclassical approaches. First, we study the exact quantum evolution of the harmonic generation by applying numerical methods including those of Hamiltonian diagonalization and global characteristics. We show explicitly that the harmonic generations can indeed serve as a source of nonclassical light. Then, we demonstrate that the quasi-stationary sub-Poissonian light can be generated in these quantum processes under conditions corresponding to the so-called no-energy-transfer regime known in classical nonlinear optics. By applying method of classical trajectories, we demonstrate that the analytical predictions of the Fano factors are in good agreement with the quantum results. On comparing second and higher harmonic generations in the no-energy-transfer regime, we show that the highest noise reduction is achieved in third-harmonic generation with the Fano-factor of the ...
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples
Energy Technology Data Exchange (ETDEWEB)
Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J
2007-10-24
Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Ferber, Georg; Staner, Luc; Boeijinga, Peter
2011-09-30
Pharmacodynamic (PD) clinical studies are characterised by a high degree of multiplicity. This multiplicity is the result of the design of these studies that typically investigate effects of a number of biomarkers at various doses and multiple time points. Measurements are taken at many or all points of a "hyper-grid" that can be understood as the cross-product of a number of dimensions each of which has typically 3-30 discrete values. This exploratory design helps understanding the phenomena under investigation, but has made a confirmatory statistical analysis of these studies difficult, so that such an analysis is often missing in this type of studies. In this contribution we show that the cross-product structure of PD studies allows to combine several well-known techniques to address multiplicity in an effective way, so that a confirmatory analysis of these studies becomes feasible without unrealistic loss of power. We demonstrate the application of this technique in two studies that use the quantitative EEG (qEEG) as biomarker for drug activity at the GABA-A receptor. QEEG studies suffer particularly from the curse of multiplicity, since, in addition to the common dimensions like dose and time, the qEEG is measured at many locations over the scalp and in a number of frequency bands which inflate the multiplicity by a factor of about 250.
An educational review of the statistical issues in analysing utility data for cost-utility analysis.
Hunter, Rachael Maree; Baio, Gianluca; Butt, Thomas; Morris, Stephen; Round, Jeff; Freemantle, Nick
2015-04-01
The aim of cost-utility analysis is to support decision making in healthcare by providing a standardised mechanism for comparing resource use and health outcomes across programmes of work. The focus of this paper is the denominator of the cost-utility analysis, specifically the methodology and statistical challenges associated with calculating QALYs from patient-level data collected as part of a trial. We provide a brief description of the most common questionnaire used to calculate patient level utility scores, the EQ-5D, followed by a discussion of other ways to calculate patient level utility scores alongside a trial including other generic measures of health-related quality of life and condition- and population-specific questionnaires. Detail is provided on how to calculate the mean QALYs per patient, including discounting, adjusting for baseline differences in utility scores and a discussion of the implications of different methods for handling missing data. The methods are demonstrated using data from a trial. As the methods chosen can systematically change the results of the analysis, it is important that standardised methods such as patient-level analysis are adhered to as best as possible. Regardless, researchers need to ensure that they are sufficiently transparent about the methods they use so as to provide the best possible information to aid in healthcare decision making.
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Directory of Open Access Journals (Sweden)
A. V. Nikitenko
2014-04-01
Full Text Available Purpose. Defining and analysis of the probabilistic and spectral characteristics of random current in regenerative braking mode of DC electric rolling stock are observed in this paper. Methodology. The elements and methods of the probability theory (particularly the theory of stationary and non-stationary processes and methods of the sampling theory are used for processing of the regenerated current data arrays by PC. Findings. The regenerated current records are obtained from the locomotives and trains in Ukraine railways and trams in Poland. It was established that the current has uninterrupted and the jumping variations in time (especially in trams. For the random current in the regenerative braking mode the functions of mathematical expectation, dispersion and standard deviation are calculated. Histograms, probabilistic characteristics and correlation functions are calculated and plotted down for this current too. It was established that the current of the regenerative braking mode can be considered like the stationary and non-ergodic process. The spectral analysis of these records and “tail part” of the correlation function found weak periodical (or low-frequency components which are known like an interharmonic. Originality. Firstly, the theory of non-stationary random processes was adapted for the analysis of the recuperated current which has uninterrupted and the jumping variations in time. Secondly, the presence of interharmonics in the stochastic process of regenerated current was defined for the first time. And finally, the patterns of temporal changes of the correlation current function are defined too. This allows to reasonably apply the correlation functions method in the identification of the electric traction system devices. Practical value. The results of probabilistic and statistic analysis of the recuperated current allow to estimate the quality of recovered energy and energy quality indices of electric rolling stock in the
Directory of Open Access Journals (Sweden)
Le Bao
2014-11-01
Full Text Available Endogenous retroviruses (ERVs are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species’ genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.
Institute of Scientific and Technical Information of China (English)
刘晓倩; 周勇
2011-01-01
Expected shortfall (ES) model developed recently is a powerful mathematical tool to measure and control financial risk. In this paper, two-step kernel smoothed processes are used to develop a two-step nonparametric estimator of ES. Comparisons between the proposed two-step kernel smoothed ES estimator to the existing fully empirical ES estimator and one-step kernel smoothing ES estimator were made by calculating expectation and variance of them. It is of great interest that the proposed two-step kernel smoothed ES estimator has been shown to increases the variance, totally different from the existing result that the kernel smoothed VaR estimator can produces reduction in both the variance and the mean square error. In addition, the simulation results conform to the theoretical analysis. In the related empirical analysis, the close-ended funds in Shanghai and Shenzhen stock markets were explored to compute the empirical ES estimates and kernel smoothing ES estimates for risk analysis. And the RAROC of the funds sample based on weekly return and ES were computed to make the performance evaluation of the funds.The empirical results show that, with weekly return, the method based on ES is higher reliability than those based on VaR.%预期不足(ES)是近几年发展起来的用于测量和控制金融风险的量化工具.在金融时间序列中,将两步核估计应用于两步ES非参数估计之中,得到了ES模型的两步核光滑估计.通过计算其期望和方差,比较了两步核光滑ES估计与ES完全经验估计及一步核光滑估计的优劣,得到了有趣的结论:与VaR模型不同,两步光滑化并不能减小ES估计的方差,反而会增大其方差,并通过计算机模拟证实了理论获得的结论.对国内沪深两市中的封闭式基金进行了实证分析,计算了样本基金的ES完全经验估计、一步核光滑估计和两步核光滑估计,并计算了样本基金基于周收益率和ES的两步核光滑
Analysing the Severity and Frequency of Traffic Crashes in Riyadh City Using Statistical Models
Directory of Open Access Journals (Sweden)
Saleh Altwaijri
2012-12-01
Full Text Available Traffic crashes in Riyadh city cause losses in the form of deaths, injuries and property damages, in addition to the pain and social tragedy affecting families of the victims. In 2005, there were a total of 47,341 injury traffic crashes occurred in Riyadh city (19% of the total KSA crashes and 9% of those crashes were severe. Road safety in Riyadh city may have been adversely affected by: high car ownership, migration of people to Riyadh city, high daily trips reached about 6 million, high rate of income, low-cost of petrol, drivers from different nationalities, young drivers and tremendous growth in population which creates a high level of mobility and transport activities in the city. The primary objective of this paper is therefore to explore factors affecting the severity and frequency of road crashes in Riyadh city using appropriate statistical models aiming to establish effective safety policies ready to be implemented to reduce the severity and frequency of road crashes in Riyadh city. Crash data for Riyadh city were collected from the Higher Commission for the Development of Riyadh (HCDR for a period of five years from 1425H to 1429H (roughly corresponding to 2004-2008. Crash data were classified into three categories: fatal, serious-injury and slight-injury. Two nominal response models have been developed: a standard multinomial logit model (MNL and a mixed logit model to injury-related crash data. Due to a severe underreporting problem on the slight injury crashes binary and mixed binary logistic regression models were also estimated for two categories of severity: fatal and serious crashes. For frequency, two count models such as Negative Binomial (NB models were employed and the unit of analysis was 168 HAIs (wards in Riyadh city. Ward-level crash data are disaggregated by severity of the crash (such as fatal and serious injury crashes. The results from both multinomial and binary response models are found to be fairly consistent but
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Nonparametric Regression with Common Shocks
Directory of Open Access Journals (Sweden)
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
El Sharawy, Mohamed S.; Gaafar, Gamal R.
2016-12-01
Both reservoir engineers and petrophysicists have been concerned about dividing a reservoir into zones for engineering and petrophysics purposes. Through decades, several techniques and approaches were introduced. Out of them, statistical reservoir zonation, stratigraphic modified Lorenz (SML) plot and the principal component and clustering analyses techniques were chosen to apply on the Nubian sandstone reservoir of Palaeozoic - Lower Cretaceous age, Gulf of Suez, Egypt, by using five adjacent wells. The studied reservoir consists mainly of sandstone with some intercalation of shale layers with varying thickness from one well to another. The permeability ranged from less than 1 md to more than 1000 md. The statistical reservoir zonation technique, depending on core permeability, indicated that the cored interval of the studied reservoir can be divided into two zones. Using reservoir properties such as porosity, bulk density, acoustic impedance and interval transit time indicated also two zones with an obvious variation in separation depth and zones continuity. The stratigraphic modified Lorenz (SML) plot indicated the presence of more than 9 flow units in the cored interval as well as a high degree of microscopic heterogeneity. On the other hand, principal component and cluster analyses, depending on well logging data (gamma ray, sonic, density and neutron), indicated that the whole reservoir can be divided at least into four electrofacies having a noticeable variation in reservoir quality, as correlated with the measured permeability. Furthermore, continuity or discontinuity of the reservoir zones can be determined using this analysis.
Nonparametric TOA estimators for low-resolution IR-UWB digital receiver
Institute of Scientific and Technical Information of China (English)
Yanlong Zhang; Weidong Chen
2015-01-01
Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.
Nonparametric Bayesian Modeling of Complex Networks
DEFF Research Database (Denmark)
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
An asymptotically optimal nonparametric adaptive controller
Institute of Scientific and Technical Information of China (English)
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
An, Ming-Wen; Mandrekar, Sumithra J; Edelman, Martin J; Sargent, Daniel J
2014-07-01
The primary goal of Phase II clinical trials is to understand better a treatment's safety and efficacy to inform a Phase III go/no-go decision. Many Phase II designs have been proposed, incorporating randomization, interim analyses, adaptation, and patient selection. The Phase II design with an option for direct assignment (i.e. stop randomization and assign all patients to the experimental arm based on a single interim analysis (IA) at 50% accrual) was recently proposed [An et al., 2012]. We discuss this design in the context of existing designs, and extend it from a single-IA to a two-IA design. We compared the statistical properties and clinical relevance of the direct assignment design with two IA (DAD-2) versus a balanced randomized design with two IA (BRD-2) and a direct assignment design with one IA (DAD-1), over a range of response rate ratios (2.0-3.0). The DAD-2 has minimal loss in power (assignment design, especially with two IA, provides a middle ground with desirable statistical properties and likely appeal to both clinicians and patients. Copyright © 2014 Elsevier Inc. All rights reserved.
Buhler, J; Owerbach, D; Schäffer, A A; Kimmel, M; Gabbay, K H
1997-01-01
We have developed software and statistical tools for linkage analysis of polygenic diseases. We use type I diabetes mellitus (insulin-dependent diabetes mellitus, IDDM) as our model system. Two susceptibility loci (IDDM1 on 6p21 and IDDM2 on 11p15) are well established, and recent genome searches suggest the existence of other susceptibility loci. We have implemented CASPAR, a software tool that makes it possible to test for linkage quickly and efficiently using multiple polymorphic DNA markers simultaneously in nuclear families consisting of two unaffected parents and a pair of affected siblings (ASP). We use a simulation-based method to determine whether lod scores from a collection of ASP tests are significant. We test our new software and statistical tools to assess linkage of IDDM5 and IDDM7 conditioned on analyses with 1 or 2 other unlinked type I diabetes susceptibility loci. The results from the CASPAR analysis suggest that conditioning of IDDM5 on IDDM1 and IDDM4, and of IDDM7 on IDDM1 and IDDM2 provides significant benefits for the genetic analysis of polygenic loci.
Buttigieg, Pier Luigi; Ramette, Alban
2014-12-01
The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Abe, H; Wako, H
2006-07-01
Folding and unfolding simulations of three-dimensional lattice proteins were analyzed using a simplified statistical mechanical model in which their amino acid sequences and native conformations were incorporated explicitly. Using this statistical mechanical model, under the assumption that only interactions between amino acid residues within a local structure in a native state are considered, the partition function of the system can be calculated for a given native conformation without any adjustable parameter. The simulations were carried out for two different native conformations, for each of which two foldable amino acid sequences were considered. The native and non-native contacts between amino acid residues occurring in the simulations were examined in detail and compared with the results derived from the theoretical model. The equilibrium thermodynamic quantities (free energy, enthalpy, entropy, and the probability of each amino acid residue being in the native state) at various temperatures obtained from the simulations and the theoretical model were also examined in order to characterize the folding processes that depend on the native conformations and the amino acid sequences. Finally, the free energy landscapes were discussed based on these analyses.
Directory of Open Access Journals (Sweden)
Ouypornprasert Winai
2016-01-01
Full Text Available The objectives of this technical paper were to propose the optimum partial replacement of cement by fly ash based on the complete consumption of calcium hydroxide from hydration reactions of cement and the long-term strength activity index based on equivalent calcium silicate hydrate as well as the propagation of uncertainty due to randomness inherent in main chemical compositions in cement and fly ash. Firstly the hydration- and pozzolanic reactions as well as stoichiometry were reviewed. Then the optimum partial replacement of cement by fly ash was formulated. After that the propagation of uncertainty due to main chemical compositions in cement and fly ash was discussed and the reliability analyses for applying the suitable replacement were reviewed. Finally an applicability of the concepts mentioned above based on statistical data of materials available was demonstrated. The results from analyses were consistent with the testing results by other researchers. The results of this study provided guidelines of suitable utilization of fly ash for partial replacement of cement. It was interesting to note that these concepts could be extended to optimize partial replacement of cement by other types of pozzolan which were described in the other papers of the authors.
Directory of Open Access Journals (Sweden)
Gavin B Stewart
Full Text Available BACKGROUND: Individual participant data (IPD meta-analyses that obtain "raw" data from studies rather than summary data typically adopt a "two-stage" approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of "one-stage" approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare "two-stage" and "one-stage" models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. METHODS AND FINDINGS: We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97. Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. CONCLUSIONS: For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled
DEFF Research Database (Denmark)
Thorlund, Kristian; Wetterslev, Jørn; Awad, Tahany;
2011-01-01
In random-effects model meta-analysis, the conventional DerSimonian-Laird (DL) estimator typically underestimates the between-trial variance. Alternative variance estimators have been proposed to address this bias. This study aims to empirically compare statistical inferences from random......-effects model meta-analyses on the basis of the DL estimator and four alternative estimators, as well as distributional assumptions (normal distribution and t-distribution) about the pooled intervention effect. We evaluated the discrepancies of p-values, 95% confidence intervals (CIs) in statistically...... significant meta-analyses, and the degree (percentage) of statistical heterogeneity (e.g. I(2)) across 920 Cochrane primary outcome meta-analyses. In total, 414 of the 920 meta-analyses were statistically significant with the DL meta-analysis, and 506 were not. Compared with the DL estimator, the four...
Nonparametric Detection of Geometric Structures Over Networks
Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent
2017-10-01
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Statistical Analysis of Data for Timber Strengths
DEFF Research Database (Denmark)
Sørensen, John Dalsgaard; Hoffmeyer, P.
Statistical analyses are performed for material strength parameters from approximately 6700 specimens of structural timber. Non-parametric statistical analyses and fits to the following distributions types have been investigated: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull....... The statistical fits have generally been made using all data (100%) and the lower tail (30%) of the data. The Maximum Likelihood Method and the Least Square Technique have been used to estimate the statistical parameters in the selected distributions. 8 different databases are analysed. The results show that 2......-parameter Weibull (and Normal) distributions give the best fits to the data available, especially if tail fits are used whereas the LogNormal distribution generally gives poor fit and larger coefficients of variation, especially if tail fits are used....
Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A
2015-05-01
Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.
Parametric and Non-Parametric System Modelling
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
Bayesian nonparametric duration model with censorship
Directory of Open Access Journals (Sweden)
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
A non-parametric peak finder algorithm and its application in searches for new physics
Chekanov, S
2011-01-01
We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.
Elvik, R
1998-03-01
The validity of weighted mean results estimated in meta-analysis has been criticized. This paper presents a set of simple statistical and graphical techniques that can be used in meta-analysis to evaluate common points of criticism. The graphical techniques are based on funnel graph diagrams. Problems and techniques for dealing with them that are discussed include: (1) the so-called 'apples and oranges' problem, stating that mean results in meta-analysis tend to gloss over important differences that should be highlighted. A test of the homogeneity of results is described for testing the presence of this problem. If results are highly heterogeneous, a random effects model of meta-analysis is more appropriate than the fixed effects model of analysis. (2) The possible presence of skewness in a sample of results. This can be tested by comparing the mode, median and mean of the results in the sample. (3) The possible presence of more than one mode in a sample of results. This can be tested by forming a frequency distribution of the results and examining the shape of this distribution. (4) The sensitivity of the mean to the possible presence of atypical results (outliers) can be tested by comparing the overall mean to the mean of all results except the one suspected of being atypical. (5) The possible presence of publication bias can be tested by visual inspection of funnel graph diagrams in which data points have been sorted according to statistical significance and direction of effect. (6) The possibility of underestimating the standard error of the mean in meta-analyses by using multiple, correlated results from the same study as the unit of analysis can be addressed by using the jack-knife technique for estimating the uncertainty of the mean. Brief examples, taken from road safety research, are given of all these techniques.
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
Directory of Open Access Journals (Sweden)
Florian Heib
2016-11-01
Full Text Available Surface science, which includes the preparation, development and analysis of surfaces and coatings, is essential in both fundamental and applied as well as in engineering and industrial research. Contact angle measurements using sessile drop techniques are commonly used to characterize coated surfaces or surface modifications. Well-defined surfaces structures at both nanoscopic and microscopic level can be achieved but the reliable characterization by means of contact angle measurements and their interpretation often remains an open question. Thus, we focused our research effort on one main problem of surface science community, which is the determination of correct and valid definitions and measurements of contact angles. In this regard, we developed the high-precision drop shape analysis (HPDSA, which involves a complex transformation of images from sessile drop experiments to Cartesian coordinates and opens up the possibility of a physically meaningful contact angle calculation. To fulfill the dire need for a reproducible contact angle determination/definition, we developed three easily adaptable statistical analyses procedures. In the following, the basic principles of HPDSA will be explained and applications of HPDSA will be illustrated. Thereby, the unique potential of this analysis approach will be illustrated by means of selected examples.
Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.
2010-01-01
This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Energy Technology Data Exchange (ETDEWEB)
Cheng, K.H.; Swift, D.L. [Johns Hopkins Univ., Baltimore, MD (United States); Yang, Y.H. [Univ. of North Carolina, Chapel Hill, NC (United States)] [and others
1995-12-01
Regional deposition of inhaled aerosols in the respiratory tract is a significant factor in assessing the biological effects from exposure to a variety of environmental particles. Understanding the deposition efficiency of inhaled aerosol particles in the nasal and oral airways can help evaluate doses to the extrathoracic region as well as to the lung. Dose extrapolation from laboratory animals to humans has been questioned due to significant physiological and anatomical variations. Although human studies are considered ideal for obtaining in vivo toxicity information important in risk assessment, the number of subjects in the study is often small compared to epidemiological and animal studies. This study measured in vivo the nasal airway dimensions and the extrathoracic deposition of ultrafine aerosols in 10 normal adult males. Variability among individuals was significant. The nasal geometry of each individual was characterized at a resolution of 3 mm using magnetic resonance imaging (MRI) and acoustic rhinometry (AR). The turbulent diffusion theory was used to describe the nonlinear nature of extrathoracic aerosol deposition. To determine what dimensional features of the nasal airway were responsible for the marked differences in particle deposition, the MIXed-effects NonLINear Regression (MIXNLIN) procedure was used to account for the random effort of repeated measurements on the same subject. Using both turbulent diffusion theory and MIXNLIN, the ultrafine particle deposition is correlated with nasal dimensions measured by the surface area, minimum cross-sectional area, and complexity of the airway shape. The combination of MRI and AR is useful for characterizing both detailed nasal dimensions and temporal changes in nasal patency. We conclude that a suitable statistical procedure incorporated with existing physical theories must be used in data analyses for experimental studies of aerosol deposition that involve a relatively small number of human subjects.
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Further Research into a Non-Parametric Statistical Screening System.
1979-12-14
Let X = V if birth weight is high X2 = 0 if gestation length is short V2 if gestation length is long Normal babies have high birth weight and long... gestation length or low birth weight and short gestation length . Abnormal babies have either of the other two combinations ((0, 1) or (1, 0)). The LDF
A Statistical, Nonparametric Methodology for Document Degradation Model Validation
1999-01-01
j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno &;c’)%m�wb%jl%V_=i"jl_t� nvj,�Ac*,d’&)w4j!*,’j!�¢�7d*j,*�%nv&;no&;c£p8mq%_,_�n¥¤)i...dmvmoiwi"�(^)i"*,nv�Ui"&(jl_�j,d _�j!’)w4+j,�bi¡h;*!i%r(w4d� &|^)dnv&4j�_ad� j,�biU_=+;_=j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno ...34 mvno �;dd(w°�7’&Rp8j,nvd&-d� j!�;i¦w;%j�%4�U© jj!’*,&R_ d’j�j!�)%j\\j!�;nq_\\w4n�_�jl%&)p8i��7’&Rp8j,nvd&-nq_tj!�;iU_!%�UiL%_ j!�;i�d&;i
a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows
Molina, J. M.; Ramirez, J. A.; Raff, D. A.
2011-12-01
A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Melsen, W G; Rovers, M M; Bonten, M J M; Bootsma, M C J
2014-01-01
Variance between studies in a meta-analysis will exist. This heterogeneity may be of clinical, methodological or statistical origin. The last of these is quantified by the I(2) -statistic. We investigated, using simulated studies, the accuracy of I(2) in the assessment of heterogeneity and the effec
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Decompounding random sums: A nonparametric approach
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...
Howe, Aaron S; Leung, Tiffany; Bani-Fatemi, Ali; Souza, Renan; Tampakeras, Maria; Zai, Clement; Kennedy, James L; Strauss, John; De Luca, Vincenzo
2014-06-01
In the present study, we examined whether there was an association between dopamine-β hydroxylase (DBH) promoter polymorphisms (a 5'-ins/del and a GTn repeats) and a history of suicide attempt in 223 chronic schizophrenia individuals using statistical and molecular analyses. Within the genetic association study design, we compared the statistical haplotype phase with the molecular phase produced by the amplicon size analysis. The two DBH polymorphisms were analysed using the Applied Biosystem 3130 and the statistical analyses were carried out using UNPHASED v.3.1.5 and PHASE v.2.1.1 to determine the haplotype frequencies and infer the phase in each patient. Then, DBH polymorphisms were incorporated into the Haploscore analysis to test the association with a history of suicide attempt. In our sample, 62 individuals had a history of suicide attempt. There was no association between DBH polymorphisms and a history of suicide attempt across the different analytical strategies applied. There was no significant difference between the haplotype frequencies produced by the amplicon size analysis and statistical analytical strategies. However, some of the haplotype pairs inferred in the PHASE analysis were inconsistent with the molecular haplotype size measured by the ABI 3130. The amplicon size analysis proved to be the most accurate method using the haplotype as a possible genetic marker for future testing. Although the results were not significant, further molecular analyses of the DBH gene and other candidate genes can clarify the utility of the molecular phase in psychiatric genetics and personalized medicine.
Directory of Open Access Journals (Sweden)
Rabia Ece OMAY
2013-06-01
Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.
NParCov3: A SAS/IML Macro for Nonparametric Randomization-Based Analysis of Covariance
Directory of Open Access Journals (Sweden)
Richard C. Zink
2012-07-01
Full Text Available Analysis of covariance serves two important purposes in a randomized clinical trial. First, there is a reduction of variance for the treatment effect which provides more powerful statistical tests and more precise confidence intervals. Second, it provides estimates of the treatment effect which are adjusted for random imbalances of covariates between the treatment groups. The nonparametric analysis of covariance method of Koch, Tangen, Jung, and Amara (1998 defines a very general methodology using weighted least-squares to generate covariate-adjusted treatment effects with minimal assumptions. This methodology is general in its applicability to a variety of outcomes, whether continuous, binary, ordinal, incidence density or time-to-event. Further, its use has been illustrated in many clinical trial settings, such as multi-center, dose-response and non-inferiority trials.NParCov3 is a SAS/IML macro written to conduct the nonparametric randomization-based covariance analyses of Koch et al. (1998. The software can analyze a variety of outcomes and can account for stratification. Data from multiple clinical trials will be used for illustration.
Thorlund, Kristian; Wetterslev, Jørn; Awad, Tahany; Thabane, Lehana; Gluud, Christian
2011-12-01
In random-effects model meta-analysis, the conventional DerSimonian-Laird (DL) estimator typically underestimates the between-trial variance. Alternative variance estimators have been proposed to address this bias. This study aims to empirically compare statistical inferences from random-effects model meta-analyses on the basis of the DL estimator and four alternative estimators, as well as distributional assumptions (normal distribution and t-distribution) about the pooled intervention effect. We evaluated the discrepancies of p-values, 95% confidence intervals (CIs) in statistically significant meta-analyses, and the degree (percentage) of statistical heterogeneity (e.g. I(2)) across 920 Cochrane primary outcome meta-analyses. In total, 414 of the 920 meta-analyses were statistically significant with the DL meta-analysis, and 506 were not. Compared with the DL estimator, the four alternative estimators yielded p-values and CIs that could be interpreted as discordant in up to 11.6% or 6% of the included meta-analyses pending whether a normal distribution or a t-distribution of the intervention effect estimates were assumed. Large discrepancies were observed for the measures of degree of heterogeneity when comparing DL with each of the four alternative estimators. Estimating the degree (percentage) of heterogeneity on the basis of less biased between-trial variance estimators seems preferable to current practice. Disclosing inferential sensitivity of p-values and CIs may also be necessary when borderline significant results have substantial impact on the conclusion. Copyright © 2012 John Wiley & Sons, Ltd.
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses.
Callahan, Ben J; Sankaran, Kris; Fukuyama, Julia A; McMurdie, Paul J; Holmes, Susan P
2016-01-01
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
Statistical Analysis of Data for Timber Strengths
DEFF Research Database (Denmark)
Sørensen, John Dalsgaard
2003-01-01
Statistical analyses are performed for material strength parameters from a large number of specimens of structural timber. Non-parametric statistical analysis and fits have been investigated for the following distribution types: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull....... The statistical fits have generally been made using all data and the lower tail of the data. The Maximum Likelihood Method and the Least Square Technique have been used to estimate the statistical parameters in the selected distributions. The results show that the 2-parameter Weibull distribution gives the best...... fits to the data available, especially if tail fits are used whereas the Log Normal distribution generally gives a poor fit and larger coefficients of variation, especially if tail fits are used. The implications on the reliability level of typical structural elements and on partial safety factors...
Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Ford, Eric B; Steffen, Jason H; Carter, Joshua A; Fressin, Francois; Holman, Matthew J; Lissauer, Jack J; Moorhead, Althea V; Morehead, Robert C; Ragozzine, Darin; Rowe, Jason F; Welsh, William F; Allen, Christopher; Batalha, Natalie M; Borucki, William J; Bryson, Stephen T; Buchhave, Lars A; Burke, Christopher J; Caldwell, Douglas A; Charbonneau, David; Clarke, Bruce D; Cochran, William D; Désert, Jean-Michel; Endl, Michael; Everett, Mark E; Fischer, Debra A; Gautier, Thomas N; Gilliland, Ron L; Jenkins, Jon M; Haas, Michael R; Horch, Elliott; Howell, Steve B; Ibrahim, Khadeejah A; Isaacson, Howard; Koch, David G; Latham, David W; Li, Jie; Lucas, Philip; MacQueen, Phillip J; Marcy, Geoffrey W; McCauliff, Sean; Mullally, Fergal R; Quinn, Samuel N; Quintana, Elisa; Shporer, Avi; Still, Martin; Tenenbaum, Peter; Thompson, Susan E; Torres, Guillermo; Twicken, Joseph D; Wohler, Bill
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:...
Non-parametric Morphologies of Mergers in the Illustris Simulation
Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G
2016-01-01
We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...
Santos, José António; Galante-Oliveira, Susana; Barroso, Carlos
2011-03-01
The current work presents an innovative statistical approach to model ordinal variables in environmental monitoring studies. An ordinal variable has values that can only be compared as "less", "equal" or "greater" and it is not possible to have information about the size of the difference between two particular values. The example of ordinal variable under this study is the vas deferens sequence (VDS) used in imposex (superimposition of male sexual characters onto prosobranch females) field assessment programmes for monitoring tributyltin (TBT) pollution. The statistical methodology presented here is the ordered logit regression model. It assumes that the VDS is an ordinal variable whose values match up a process of imposex development that can be considered continuous in both biological and statistical senses and can be described by a latent non-observable continuous variable. This model was applied to the case study of Nucella lapillus imposex monitoring surveys conducted in the Portuguese coast between 2003 and 2008 to evaluate the temporal evolution of TBT pollution in this country. In order to produce more reliable conclusions, the proposed model includes covariates that may influence the imposex response besides TBT (e.g. the shell size). The model also provides an analysis of the environmental risk associated to TBT pollution by estimating the probability of the occurrence of females with VDS ≥ 2 in each year, according to OSPAR criteria. We consider that the proposed application of this statistical methodology has a great potential in environmental monitoring whenever there is the need to model variables that can only be assessed through an ordinal scale of values.
Directory of Open Access Journals (Sweden)
Weiß, Verena
2015-10-01
Full Text Available Introduction: For survival data the coefficient of determination cannot be used to describe how good a model fits to the data. Therefore, several measures of explained variation for survival data have been proposed in recent years.Methods: We analyse an existing measure of explained variation with regard to minimisation aspects and demonstrate that these are not fulfilled for the measure.Results: In analogy to the least squares method from linear regression analysis we develop a novel measure for categorical covariates which is based only on the Kaplan-Meier estimator. Hence, the novel measure is a completely nonparametric measure with an easy graphical interpretation. For the novel measure different weighting possibilities are available and a statistical test of significance can be performed. Eventually, we apply the novel measure and further measures of explained variation to a dataset comprising persons with a histopathological papillary thyroid carcinoma.Conclusion: We propose a novel measure of explained variation with a comprehensible derivation as well as a graphical interpretation, which may be used in further analyses with survival data.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Statistics Anxiety and Business Statistics: The International Student
Bell, James A.
2008-01-01
Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…
Antczak, K; Wilczyńska, U
1980-01-01
2 statistical models for evaluation of toxicological studies results have been presented. Model I. after R. Hoschek and H. J. Schittke (2) involves: 1. Elimination of the values deviating from most results-by Grubbs' method (2). 2. Analysis of the differences between the results obtained by the participants of the action and tentatively assumed value. 3. Evaluation of significant differences between the reference value and average value for a given series of measurements. 4. Thorough evaluation of laboratories based on evaluation coefficient fx. Model II after Keppler et al. As a criterion for evaluating the results the authors assumed the median. Individual evaluation of laboratories was performed on the basis of: 1. Adjusted test "t" 2. Linear regression test.
A non-parametric framework for estimating threshold limit values
Directory of Open Access Journals (Sweden)
Ulm Kurt
2005-11-01
Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Anestis Antoniadis
2001-06-01
Full Text Available Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
portfolio optimization based on nonparametric estimation methods
Directory of Open Access Journals (Sweden)
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Landstad, Bodil J; Gelin, Gunnar; Malmquist, Claes; Vinberg, Stig
2002-09-15
The study had two primary aims. The first aim was to combine a human resources costing and accounting approach (HRCA) with a quantitative statistical approach in order to get an integrated model. The second aim was to apply this integrated model in a quasi-experimental study in order to investigate whether preventive intervention affected sickness absence costs at the company level. The intervention studied contained occupational organizational measures, competence development, physical and psychosocial working environmental measures and individual and rehabilitation measures on both an individual and a group basis. The study is a quasi-experimental design with a non-randomized control group. Both groups involved cleaning jobs at predominantly female workplaces. The study plan involved carrying out before and after studies on both groups. The study included only those who were at the same workplace during the whole of the study period. In the HRCA model used here, the cost of sickness absence is the net difference between the costs, in the form of the value of the loss of production and the administrative cost, and the benefits in the form of lower labour costs. According to the HRCA model, the intervention used counteracted a rise in sickness absence costs at the company level, giving an average net effect of 266.5 Euros per person (full-time working) during an 8-month period. Using an analogue statistical analysis on the whole of the material, the contribution of the intervention counteracted a rise in sickness absence costs at the company level giving an average net effect of 283.2 Euros. Using a statistical method it was possible to study the regression coefficients in sub-groups and calculate the p-values for these coefficients; in the younger group the intervention gave a calculated net contribution of 605.6 Euros with a p-value of 0.073, while the intervention net contribution in the older group had a very high p-value. Using the statistical model it was
Directory of Open Access Journals (Sweden)
Fulufhelo Ṋemavhola
2017-09-01
Full Text Available The paper exhibits the examination of life quality evaluation of helical coil springs in the railway industry as it impacts the safety of the transportation of goods and people. The types of spring considered are: the external spring, internal spring and stabiliser spring. Statistical process control was utilised as the fundamental instrument in the investigation. Measurements were performed using a measuring tape, dynamic actuators and the vernier caliper. The purpose of this research was to examine the usability of old helical springs found in a railway environment. The goal of the experiment was to obtain factual statistical information to determine the life quality of the helical springs used in the railroad transportation environment. Six sigma advocacies were additionally used as a part of this paper. According to six sigma estimation examination only the stabilizers and inner springs for coil bar diameter met the six sigma prerequisites. It is reasoned that the coil springs should be replaced as they do not meet the six sigma requirements.
Garcia, Juan A L; Bartumeus, Frederic; Roche, David; Giraldo, Jesús; Stanley, H Eugene; Casamayor, Emilio O
2008-06-01
We combined genometric (DNA walks) and statistical (detrended fluctuation analysis) methods on 456 prokaryotic chromosomes from 309 different bacterial and archaeal species to look for specific patterns and long-range correlations along the genome and relate them to ecological lifestyles. The position of each nucleotide along the complete genome sequence was plotted on an orthogonal plane (DNA landscape), and fluctuation analysis applied to the DNA walk series showed a long-range correlation in contrast to the lack of correlation for artificially generated genomes. Different features in the DNA landscapes among genomes from different ecological and metabolic groups of prokaryotes appeared with the combined analysis. Transition from hyperthermophilic to psychrophilic environments could have been related to more complex structural adaptations in microbial genomes, whereas for other environmental factors such as pH and salinity this effect would have been smaller. Prokaryotes with domain-specific metabolisms, such as photoautotrophy in Bacteria and methanogenesis in Archaea, showed consistent differences in genome correlation structure. Overall, we show that, beyond the relative proportion of nucleotides, correlation properties derived from their sequential position within the genome hide relevant phylogenetic and ecological information. This can be studied by combining genometric and statistical physics methods, leading to a reduction of genome complexity to a few useful descriptors.
Directory of Open Access Journals (Sweden)
Jing-Chzi Hsieh
2016-05-01
Full Text Available The recycled Kevlar®/polyester/low-melting-point polyester (recycled Kevlar®/PET/LPET nonwoven geotextiles are immersed in neutral, strong acid, and strong alkali solutions, respectively, at different temperatures for four months. Their tensile strength is then tested according to various immersion periods at various temperatures, in order to determine their durability to chemicals. For the purpose of analyzing the possible factors that influence mechanical properties of geotextiles under diverse environmental conditions, the experimental results and statistical analyses are incorporated in this study. Therefore, influences of the content of recycled Kevlar® fibers, implementation of thermal treatment, and immersion periods on the tensile strength of recycled Kevlar®/PET/LPET nonwoven geotextiles are examined, after which their influential levels are statistically determined by performing multiple regression analyses. According to the results, the tensile strength of nonwoven geotextiles can be enhanced by adding recycled Kevlar® fibers and thermal treatment.
Directory of Open Access Journals (Sweden)
Carles Comas
2015-04-01
Full Text Available Aim of study: Understanding inter- and intra-specific competition for water is crucial in drought-prone environments. However, little is known about the spatial interdependencies for water uptake among individuals in mixed stands. The aim of this work was to compare water uptake patterns during a drought episode in two common Mediterranean tree species, Quercus ilex L. and Pinus halepensis Mill., using the isotope composition of xylem water (δ18O, δ2H as hydrological marker. Area of study: The study was performed in a mixed stand, sampling a total of 33 oaks and 78 pines (plot area= 888 m2. We tested the hypothesis that both species uptake water differentially along the soil profile, thus showing different levels of tree-to-tree interdependency, depending on whether neighbouring trees belong to one species or the other. Material and Methods: We used pair-correlation functions to study intra-specific point-tree configurations and the bivariate pair correlation function to analyse the inter-specific spatial configuration. Moreover, the isotopic composition of xylem water was analysed as a mark point pattern. Main results: Values for Q. ilex (δ18O = –5.3 ± 0.2‰, δ2H = –54.3 ± 0.7‰ were significantly lower than for P. halepensis (δ18O = –1.2 ± 0.2‰, δ2H = –25.1 ± 0.8‰, pointing to a greater contribution of deeper soil layers for water uptake by Q. ilex. Research highlights: Point-process analyses revealed spatial intra-specific dependencies among neighbouring pines, showing neither oak-oak nor oak-pine interactions. This supports niche segregation for water uptake between the two species.
Chen, Fei; Taylor, William D; Anderson, William B; Huck, Peter M
2013-08-01
This study investigates the suitability of multivariate techniques, including principal component analysis and discriminant function analysis, for analysing polycyclic aromatic hydrocarbon and heavy metal-contaminated aquatic sediment data. We show that multivariate "fingerprint" analysis of relative abundances of contaminants can characterize a contamination source and distinguish contaminated sediments of interest from background contamination. Thereafter, analysis of the unstandardized concentrations among samples contaminated from the same source can identify migration pathways within a study area that is hydraulically complex and has a long contamination history, without reliance on complex hydrodynamic data and modelling techniques. Together, these methods provide an effective tool for drinking water source monitoring and protection.
Kobayashi, Katsumi; Pillai, K Sadasivan; Guhatakurta, Soma; Cherian, K M; Ohnishi, Mariko
2011-01-01
In the present study, an attempt was made to compare the statistical tools used for analysing the data of repeated dose toxicity studies with rodents conducted in 45 countries, with that of Japan. The study revealed that there was no congruence among the countries in the use of statistical tools for analysing the data obtained from the above studies. For example, to analyse the data obtained from repeated dose toxicity studies with rodents, Scheffé's multiple range and Dunnett type (joint type Dunnett) tests are commonly used in Japan, but in other countries use of these statistical tools is not so common. However, statistical techniques used for testing the above data for homogeneity of variance and inter-group comparisons do not differ much between Japan and other countries. In Japan, the data are generally not tested for normality and the same is true with the most of the countries investigated. In the present investigation, out of 127 studies examined, data of only 6 studies were analysed for both homogeneity of variance and normal distribution. For examining homogeneity of variance, we propose Levene's test, since the commonly used Bartlett's test may show heterogeneity in variance in all the groups, if a slight heterogeneity in variance is seen any one of the groups. We suggest the data may be examined for both homogeneity of variance and normal distribution. For the data of the groups that do not show heterogeneity of variance, to find the significant difference among the groups, we recommend Dunnett's test, and for those show heterogeneity of variance, we recommend Steel's test.
Forbes, Valery E; Aufderheide, John; Warbritton, Ryan; van der Hoeven, Nelly; Caspers, Norbert
2007-03-01
This study presents results of the effects of bisphenol A (BPA) on adult egg production, egg hatchability, egg development rates and juvenile growth rates in the freshwater gastropod, Marisa cornuarietis. We observed no adult mortality, substantial inter-snail variability in reproductive output, and no effects of BPA on reproduction during 12 weeks of exposure to 0, 0.1, 1.0, 16, 160 or 640 microg/L BPA. We observed no effects of BPA on egg hatchability or timing of egg hatching. Juveniles showed good growth in the control and all treatments, and there were no significant effects of BPA on this endpoint. Our results do not support previous claims of enhanced reproduction in Marisa cornuarietis in response to exposure to BPA. Statistical power analysis indicated high levels of inter-snail variability in the measured endpoints and highlighted the need for sufficient replication when testing treatment effects on reproduction in M. cornuarietis with adequate power.
Myers, R. H.
1976-01-01
The depletion of ozone in the stratosphere is examined, and causes for the depletion are cited. Ground station and satellite measurements of ozone, which are taken on a worldwide basis, are discussed. Instruments used in ozone measurement are discussed, such as the Dobson spectrophotometer, which is credited with providing the longest and most extensive series of observations for ground based observation of stratospheric ozone. Other ground based instruments used to measure ozone are also discussed. The statistical differences of ground based measurements of ozone from these different instruments are compared to each other, and to satellite measurements. Mathematical methods (i.e., trend analysis or linear regression analysis) of analyzing the variability of ozone concentration with respect to time and lattitude are described. Various time series models which can be employed in accounting for ozone concentration variability are examined.
Institute of Scientific and Technical Information of China (English)
Liu Yanqiong; Chen Yingwu
2006-01-01
When analyze the uncertainty of the cost and the schedule of the spaceflight project, it is needed to know the value of the schedule-cost correlation coefficient. This paper deduces the schedule distribution, considering the effect of the cost, and proposes the estimation formula of the correlation coefficient between the ln(schedule) and the cost. On the basis of the fact and Taylor expansion, the relation expression between the schedule-cost correlation coefficient and the ln-schedule-cost correlation coefficient is put forward. By analyzing the value features of the estimation formula of the ln-schedule-cost correlation coefficient, the general rules are proposed to ascertain the value of the schedule-cost correlation coefficient. An example is given to demonstrate how to approximately amend the schedule-cost correlation coefficient based on the historical statistics, which reveals the traditional assigned value is inaccurate. The universality of this estimation method is analyzed.
Non-parametric partitioning of SAR images
Delyon, G.; Galland, F.; Réfrégier, Ph.
2006-09-01
We describe and analyse a generalization of a parametric segmentation technique adapted to Gamma distributed SAR images to a simple non parametric noise model. The partition is obtained by minimizing the stochastic complexity of a quantized version on Q levels of the SAR image and lead to a criterion without parameters to be tuned by the user. We analyse the reliability of the proposed approach on synthetic images. The quality of the obtained partition will be studied for different possible strategies. In particular, one will discuss the reliability of the proposed optimization procedure. Finally, we will precisely study the performance of the proposed approach in comparison with the statistical parametric technique adapted to Gamma noise. These studies will be led by analyzing the number of misclassified pixels, the standard Hausdorff distance and the number of estimated regions.
A Non-Parametric Spatial Independence Test Using Symbolic Entropy
Directory of Open Access Journals (Sweden)
López Hernández, Fernando
2008-01-01
Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Nonparametric Estimation of Distributions in Random Effects Models
Hart, Jeffrey D.
2011-01-01
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.
Curve registration by nonparametric goodness-of-fit testing
Dalalyan, Arnak
2011-01-01
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the p-value of the chi squared test. We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 1. Some numerical experiments on synthetic datasets are reported as well.
Revealing components of the galaxy population through nonparametric techniques
Bamford, Steven P; Nichol, Robert C; Miller, Christopher J; Wasserman, Larry; Genovese, Christopher R; Freeman, Peter E
2008-01-01
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
DEFF Research Database (Denmark)
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Wei, Jiawei; Carroll, Raymond J; Maity, Arnab
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Testing Equality of Nonparametric Functions in Two Partially Linear Models%检验两个部分线性模型中非参函数相等
Institute of Scientific and Technical Information of China (English)
施三支; 宋立新; 杨华
2008-01-01
We propose the test statistic to check whether the nonparametric func-tions in two partially linear models are equality or not in this paper. We estimate the nonparametric function both in null hypothesis and the alternative by the local linear method, where we ignore the parametric components, and then estimate the parameters by the two stage method. The test statistic is derived, and it is shown to be asymptotically normal under the null hypothesis.
Statistical Methods for Astronomy
Feigelson, Eric D
2012-01-01
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spati...
Xu, Kuan-Man; Wong, Takmeng; Wielicki, Bruce A.; Parker, Lindsay
2006-01-01
Three boundary-layer cloud object types, stratus, stratocumulus and cumulus, that occurred over the Pacific Ocean during January-August 1998, are identified from the CERES (Clouds and the Earth s Radiant Energy System) single scanner footprint (SSF) data from the TRMM (Tropical Rainfall Measuring Mission) satellite. This study emphasizes the differences and similarities in the characteristics of each cloud-object type between the tropical and subtropical regions and among different size categories and among small geographic areas. Both the frequencies of occurrence and statistical distributions of cloud physical properties are analyzed. In terms of frequencies of occurrence, stratocumulus clouds dominate the entire boundary layer cloud population in all regions and among all size categories. Stratus clouds are more prevalent in the subtropics and near the coastal regions, while cumulus clouds are relatively prevalent over open ocean and the equatorial regions, particularly, within the small size categories. The largest size category of stratus cloud objects occurs more frequently in the subtropics than in the tropics and has much larger average size than its cumulus and stratocumulus counterparts. Each of the three cloud object types exhibits small differences in statistical distributions of cloud optical depth, liquid water path, TOA albedo and perhaps cloud-top height, but large differences in those of cloud-top temperature and OLR between the tropics and subtropics. Differences in the sea surface temperature (SST) distributions between the tropics and subtropics influence some of the cloud macrophysical properties, but cloud microphysical properties and albedo for each cloud object type are likely determined by (local) boundary-layer dynamics and structures. Systematic variations of cloud optical depth, TOA albedo, cloud-top height, OLR and SST with cloud object sizes are pronounced for the stratocumulus and stratus types, which are related to systematic
Yokoyama, Shozo; Takenaka, Naomi
2005-04-01
Red-green color vision is strongly suspected to enhance the survival of its possessors. Despite being red-green color blind, however, many species have successfully competed in nature, which brings into question the evolutionary advantage of achieving red-green color vision. Here, we propose a new method of identifying positive selection at individual amino acid sites with the premise that if positive Darwinian selection has driven the evolution of the protein under consideration, then it should be found mostly at the branches in the phylogenetic tree where its function had changed. The statistical and molecular methods have been applied to 29 visual pigments with the wavelengths of maximal absorption at approximately 510-540 nm (green- or middle wavelength-sensitive [MWS] pigments) and at approximately 560 nm (red- or long wavelength-sensitive [LWS] pigments), which are sampled from a diverse range of vertebrate species. The results show that the MWS pigments are positively selected through amino acid replacements S180A, Y277F, and T285A and that the LWS pigments have been subjected to strong evolutionary conservation. The fact that these positively selected M/LWS pigments are found not only in animals with red-green color vision but also in those with red-green color blindness strongly suggests that both red-green color vision and color blindness have undergone adaptive evolution independently in different species.
How to Tell the Truth with Statistics: The Case for Accountable Data Analyses in Team-based Science.
Gelfond, Jonathan A L; Klugman, Craig M; Welty, Leah J; Heitman, Elizabeth; Louden, Christopher; Pollock, Brad H
Data analysis is essential to translational medicine, epidemiology, and the scientific process. Although recent advances in promoting reproducibility and reporting standards have made some improvements, the data analysis process remains insufficiently documented and susceptible to avoidable errors, bias, and even fraud. Comprehensively accounting for the full analytical process requires not only records of the statistical methodology used, but also records of communications among the research team. In this regard, the data analysis process can benefit from the principle of accountability that is inherent in other disciplines such as clinical practice. We propose a novel framework for capturing the analytical narrative called the Accountable Data Analysis Process (ADAP), which allows the entire research team to participate in the analysis in a supervised and transparent way. The framework is analogous to an electronic health record in which the dataset is the "patient" and actions related to the dataset are recorded in a project management system. We discuss the design, advantages, and challenges in implementing this type of system in the context of academic health centers, where team based science increasingly demands accountability.
How to Tell the Truth with Statistics: The Case for Accountable Data Analyses in Team-based Science
Gelfond, Jonathan A. L.; Klugman, Craig M.; Welty, Leah J.; Heitman, Elizabeth; Louden, Christopher; Pollock, Brad H.
2015-01-01
Data analysis is essential to translational medicine, epidemiology, and the scientific process. Although recent advances in promoting reproducibility and reporting standards have made some improvements, the data analysis process remains insufficiently documented and susceptible to avoidable errors, bias, and even fraud. Comprehensively accounting for the full analytical process requires not only records of the statistical methodology used, but also records of communications among the research team. In this regard, the data analysis process can benefit from the principle of accountability that is inherent in other disciplines such as clinical practice. We propose a novel framework for capturing the analytical narrative called the Accountable Data Analysis Process (ADAP), which allows the entire research team to participate in the analysis in a supervised and transparent way. The framework is analogous to an electronic health record in which the dataset is the “patient” and actions related to the dataset are recorded in a project management system. We discuss the design, advantages, and challenges in implementing this type of system in the context of academic health centers, where team based science increasingly demands accountability. PMID:26290897
Energy Technology Data Exchange (ETDEWEB)
Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo [Kyung Hee University, Yongin (Korea, Republic of)
2015-08-15
Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.
Boniol, Mathieu; Smans, Michel; Sullivan, Richard; Boyle, Peter
2015-01-01
Objectives We compared calculations of relative risks of cancer death in Swedish mammography trials and in other cancer screening trials. Participants Men and women from 30 to 74 years of age. Setting Randomised trials on cancer screening. Design For each trial, we identified the intervention period, when screening was offered to screening groups and not to control groups, and the post-intervention period, when screening (or absence of screening) was the same in screening and control groups. We then examined which cancer deaths had been used for the computation of relative risk of cancer death. Main outcome measures Relative risk of cancer death. Results In 17 non-breast screening trials, deaths due to cancers diagnosed during the intervention and post-intervention periods were used for relative risk calculations. In the five Swedish trials, relative risk calculations used deaths due to breast cancers found during intervention periods, but deaths due to breast cancer found at first screening of control groups were added to these groups. After reallocation of the added breast cancer deaths to post-intervention periods of control groups, relative risks of 0.86 (0.76; 0.97) were obtained for cancers found during intervention periods and 0.83 (0.71; 0.97) for cancers found during post-intervention periods, indicating constant reduction in the risk of breast cancer death during follow-up, irrespective of screening. Conclusions The use of unconventional statistical methods in Swedish trials has led to overestimation of risk reduction in breast cancer death attributable to mammography screening. The constant risk reduction observed in screening groups was probably due to the trial design that optimised awareness and medical management of women allocated to screening groups. PMID:26152677
Hartmann, Martin; Frey, Beat; Kölliker, Roland; Widmer, Franco
2005-06-01
Cultivation independent analyses of soil microbial community structures are frequently used to describe microbiological soil characteristics. This approach is based on direct extraction of total soil DNA followed by PCR amplification of selected marker genes and subsequent genetic fingerprint analyses. Semi-automated genetic fingerprinting techniques such as terminal restriction fragment length polymorphism (T-RFLP) and ribosomal intergenic spacer analysis (RISA) yield high-resolution patterns of highly diverse soil microbial communities and hold great potential for use in routine soil quality monitoring, when rapid high throughput screening for differences or changes is more important than phylogenetic identification of organisms affected. Our objective was to perform profound statistical analysis to evaluate the cultivation independent approach and the consistency of results from T-RFLP and RISA. As a model system, we used two different heavy metal treated soils from an open top chamber experiment. Bacterial T-RFLP and RISA profiles of 16S rDNA were converted into numeric data matrices in order to allow for detailed statistical analyses with cluster analysis, Mantel test statistics, Monte Carlo permutation tests and ANOVA. Analyses revealed that soil DNA-contents were significantly correlated with soil microbial biomass in our system. T-RFLP and RISA yielded highly consistent and correlating results and both allowed to distinguish the four treatments with equal significance. While RISA represents a fast and general fingerprinting method of moderate cost and labor intensity, T-RFLP is technically more demanding but offers the advantage of phylogenetic identification of detected soil microorganisms. Therefore, selection of either of these methods should be based on the specific research question under investigation.
Wienke, B R; O'Leary, T R
2008-05-01
Linking model and data, we detail the LANL diving reduced gradient bubble model (RGBM), dynamical principles, and correlation with data in the LANL Data Bank. Table, profile, and meter risks are obtained from likelihood analysis and quoted for air, nitrox, helitrox no-decompression time limits, repetitive dive tables, and selected mixed gas and repetitive profiles. Application analyses include the EXPLORER decompression meter algorithm, NAUI tables, University of Wisconsin Seafood Diver tables, comparative NAUI, PADI, Oceanic NDLs and repetitive dives, comparative nitrogen and helium mixed gas risks, USS Perry deep rebreather (RB) exploration dive,world record open circuit (OC) dive, and Woodville Karst Plain Project (WKPP) extreme cave exploration profiles. The algorithm has seen extensive and utilitarian application in mixed gas diving, both in recreational and technical sectors, and forms the bases forreleased tables and decompression meters used by scientific, commercial, and research divers. The LANL Data Bank is described, and the methods used to deduce risk are detailed. Risk functions for dissolved gas and bubbles are summarized. Parameters that can be used to estimate profile risk are tallied. To fit data, a modified Levenberg-Marquardt routine is employed with L2 error norm. Appendices sketch the numerical methods, and list reports from field testing for (real) mixed gas diving. A Monte Carlo-like sampling scheme for fast numerical analysis of the data is also detailed, as a coupled variance reduction technique and additional check on the canonical approach to estimating diving risk. The method suggests alternatives to the canonical approach. This work represents a first time correlation effort linking a dynamical bubble model with deep stop data. Supercomputing resources are requisite to connect model and data in application.
Directory of Open Access Journals (Sweden)
Wang Xiaoqiang
2012-04-01
Full Text Available Abstract Background Quantitative trait loci (QTL detection on a huge amount of phenotypes, like eQTL detection on transcriptomic data, can be dramatically impaired by the statistical properties of interval mapping methods. One of these major outcomes is the high number of QTL detected at marker locations. The present study aims at identifying and specifying the sources of this bias, in particular in the case of analysis of data issued from outbred populations. Analytical developments were carried out in a backcross situation in order to specify the bias and to propose an algorithm to control it. The outbred population context was studied through simulated data sets in a wide range of situations. The likelihood ratio test was firstly analyzed under the "one QTL" hypothesis in a backcross population. Designs of sib families were then simulated and analyzed using the QTL Map software. On the basis of the theoretical results in backcross, parameters such as the population size, the density of the genetic map, the QTL effect and the true location of the QTL, were taken into account under the "no QTL" and the "one QTL" hypotheses. A combination of two non parametric tests - the Kolmogorov-Smirnov test and the Mann-Whitney-Wilcoxon test - was used in order to identify the parameters that affected the bias and to specify how much they influenced the estimation of QTL location. Results A theoretical expression of the bias of the estimated QTL location was obtained for a backcross type population. We demonstrated a common source of bias under the "no QTL" and the "one QTL" hypotheses and qualified the possible influence of several parameters. Simulation studies confirmed that the bias exists in outbred populations under both the hypotheses of "no QTL" and "one QTL" on a linkage group. The QTL location was systematically closer to marker locations than expected, particularly in the case of low QTL effect, small population size or low density of markers, i
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Non-Parametric Estimation of Correlation Functions
DEFF Research Database (Denmark)
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Directory of Open Access Journals (Sweden)
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
Institute of Scientific and Technical Information of China (English)
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Preliminary results on nonparametric facial occlusion detection
Directory of Open Access Journals (Sweden)
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Application of a nonparametric approach to analyze delta-pCO2 data from the Southern Ocean
CSIR Research Space (South Africa)
Pretorius, WB
2011-11-01
Full Text Available In this paper the authors discuss the application of a classical nonparametric inference approach to analyse delta-pCO2 measurements from the Southern Ocean, which is a novel method to analysing data in this area, as well as comparing results...
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
An Evaluation of Parametric and Nonparametric Models of Fish Population Response.
Energy Technology Data Exchange (ETDEWEB)
Haas, Timothy C.; Peterson, James T.; Lee, Danny C.
1999-11-01
Predicting the distribution or status of animal populations at large scales often requires the use of broad-scale information describing landforms, climate, vegetation, etc. These data, however, often consist of mixtures of continuous and categorical covariates and nonmultiplicative interactions among covariates, complicating statistical analyses. Using data from the interior Columbia River Basin, USA, we compared four methods for predicting the distribution of seven salmonid taxa using landscape information. Subwatersheds (mean size, 7800 ha) were characterized using a set of 12 covariates describing physiography, vegetation, and current land-use. The techniques included generalized logit modeling, classification trees, a nearest neighbor technique, and a modular neural network. We evaluated model performance using out-of-sample prediction accuracy via leave-one-out cross-validation and introduce a computer-intensive Monte Carlo hypothesis testing approach for examining the statistical significance of landscape covariates with the non-parametric methods. We found the modular neural network and the nearest-neighbor techniques to be the most accurate, but were difficult to summarize in ways that provided ecological insight. The modular neural network also required the most extensive computer resources for model fitting and hypothesis testing. The generalized logit models were readily interpretable, but were the least accurate, possibly due to nonlinear relationships and nonmultiplicative interactions among covariates. Substantial overlap among the statistically significant (P<0.05) covariates for each method suggested that each is capable of detecting similar relationships between responses and covariates. Consequently, we believe that employing one or more methods may provide greater biological insight without sacrificing prediction accuracy.
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.
Fan, Jianqing; Ma, Yunbei; Dai, Wei
2014-01-01
The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Directory of Open Access Journals (Sweden)
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Directory of Open Access Journals (Sweden)
Aaditya Ramdas
2017-01-01
Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
Variable selection in identification of a high dimensional nonlinear non-parametric system
Institute of Scientific and Technical Information of China (English)
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Kuss, O
2015-03-30
Meta-analyses with rare events, especially those that include studies with no event in one ('single-zero') or even both ('double-zero') treatment arms, are still a statistical challenge. In the case of double-zero studies, researchers in general delete these studies or use continuity corrections to avoid them. A number of arguments against both options has been given, and statistical methods that use the information from double-zero studies without using continuity corrections have been proposed. In this paper, we collect them and compare them by simulation. This simulation study tries to mirror real-life situations as completely as possible by deriving true underlying parameters from empirical data on actually performed meta-analyses. It is shown that for each of the commonly encountered effect estimators valid statistical methods are available that use the information from double-zero studies without using continuity corrections. Interestingly, all of them are truly random effects models, and so also the current standard method for very sparse data as recommended from the Cochrane collaboration, the Yusuf-Peto odds ratio, can be improved on. For actual analysis, we recommend to use beta-binomial regression methods to arrive at summary estimates for the odds ratio, the relative risk, or the risk difference. Methods that ignore information from double-zero studies or use continuity corrections should no longer be used. We illustrate the situation with an example where the original analysis ignores 35 double-zero studies, and a superior analysis discovers a clinically relevant advantage of off-pump surgery in coronary artery bypass grafting. Copyright © 2014 John Wiley & Sons, Ltd.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
On Parametric (and Non-Parametric Variation
Directory of Open Access Journals (Sweden)
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
An assessment of the statistical procedures used in original papers ...
African Journals Online (AJOL)
descriptive statistics, contingency table analysis, simple epidemiological ... reported data and the statistical review of articles before publication will .... Nonparametric correlation. 4. 9. 5 .... Correlation coefficient as measure of agreement. 1.
Firnkorn, Daniel; Merker, Sebastian; Ganzinger, Matthias; Muley, Thomas; Knaup, Petra
2016-01-01
In medical science, modern IT concepts are increasingly important to gather new findings out of complex diseases. Data Warehouses (DWH) as central data repository systems play a key role by providing standardized, high-quality and secure medical data for effective analyses. However, DWHs in medicine must fulfil various requirements concerning data privacy and the ability to describe the complexity of (rare) disease phenomena. Here, i2b2 and tranSMART are free alternatives representing DWH solutions especially developed for medical informatics purposes. But different functionalities are not yet provided in a sufficient way. In fact, data import and export is still a major problem because of the diversity of schemas, parameter definitions and data quality which are described variously in each single clinic. Further, statistical analyses inside i2b2 and tranSMART are possible, but restricted to the implemented functions. Thus, data export is needed to provide a data basis which can be directly included within statistics software like SPSS and SAS or data mining tools like Weka and RapidMiner. The standard export tools of i2b2 and tranSMART are more or less creating a database dump of key-value pairs which cannot be used immediately by the mentioned tools. They need an instance-based or a case-based representation of each patient. To overcome this lack, we developed a concept called Generic Case Extractor (GCE) which pivots the key-value pairs of each clinical fact into a row-oriented format for each patient sufficient to enable analyses in a broader context. Therefore, complex pivotisation routines where necessary to ensure temporal consistency especially in terms of different data sets and the occurrence of identical but repeated parameters like follow-up data. GCE is embedded inside a comprehensive software platform for systems medicine.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Li, Peter; Castrillo, Juan I; Velarde, Giles; Wassink, Ingo; Soiland-Reyes, Stian; Owen, Stuart; Withers, David; Oinn, Tom; Pocock, Matthew R; Goble, Carole A; Oliver, Stephen G; Kell, Douglas B
2008-08-07
There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Taverna can be used by data analysis
Directory of Open Access Journals (Sweden)
Pocock Matthew R
2008-08-01
Full Text Available Abstract Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna
Cunningham, Michael R; Baumeister, Roy F
2016-01-01
The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger et al., 2010). Meta-analyses are supposed to reduce bias in literature reviews. Carter et al.'s (2015) meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and Funnel Plot Asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test and Precision Effect Estimate with Standard Error (PEESE) procedures. Despite these serious problems, the Carter et al. (2015) meta-analysis results actually indicate that there is a real depletion effect - contrary to their title.
Directory of Open Access Journals (Sweden)
Ian T. Kracalik
2012-11-01
Full Text Available We compared a local clustering and a cluster morphology statistic using anthrax outbreaks in large (cattle and small (sheep and goats domestic ruminants across Kazakhstan. The Getis-Ord (Gi* statistic and a multidirectional optimal ecotope algorithm (AMOEBA were compared using 1st, 2nd and 3rd order Rook contiguity matrices. Multivariate statistical tests were used to evaluate the environmental signatures between clusters and non-clusters from the AMOEBA and Gi* tests. A logistic regression was used to define a risk surface for anthrax outbreaks and to compare agreement between clustering methodologies. Tests revealed differences in the spatial distribution of clusters as well as the total number of clusters in large ruminants for AMOEBA (n = 149 and for small ruminants (n = 9. In contrast, Gi* revealed fewer large ruminant clusters (n = 122 and more small ruminant clusters (n = 61. Significant environmental differences were found between groups using the Kruskall-Wallis and Mann- Whitney U tests. Logistic regression was used to model the presence/absence of anthrax outbreaks and define a risk surface for large ruminants to compare with cluster analyses. The model predicted 32.2% of the landscape as high risk. Approximately 75% of AMOEBA clusters corresponded to predicted high risk, compared with ~64% of Gi* clusters. In general, AMOEBA predicted more irregularly shaped clusters of outbreaks in both livestock groups, while Gi* tended to predict larger, circular clusters. Here we provide an evaluation of both tests and a discussion of the use of each to detect environmental conditions associated with anthrax outbreak clusters in domestic livestock. These findings illustrate important differences in spatial statistical methods for defining local clusters and highlight the importance of selecting appropriate levels of data aggregation.
Kracalik, Ian T; Blackburn, Jason K; Lukhnova, Larisa; Pazilov, Yerlan; Hugh-Jones, Martin E; Aikimbayev, Alim
2012-11-01
We compared a local clustering and a cluster morphology statistic using anthrax outbreaks in large (cattle) and small (sheep and goats) domestic ruminants across Kazakhstan. The Getis-Ord (Gi*) statistic and a multidirectional optimal ecotope algorithm (AMOEBA) were compared using 1st, 2nd and 3rd order Rook contiguity matrices. Multivariate statistical tests were used to evaluate the environmental signatures between clusters and non-clusters from the AMOEBA and Gi* tests. A logistic regression was used to define a risk surface for anthrax outbreaks and to compare agreement between clustering methodologies. Tests revealed differences in the spatial distribution of clusters as well as the total number of clusters in large ruminants for AMOEBA (n = 149) and for small ruminants (n = 9). In contrast, Gi* revealed fewer large ruminant clusters (n = 122) and more small ruminant clusters (n = 61). Significant environmental differences were found between groups using the Kruskall-Wallis and Mann-Whitney U tests. Logistic regression was used to model the presence/absence of anthrax outbreaks and define a risk surface for large ruminants to compare with cluster analyses. The model predicted 32.2% of the landscape as high risk. Approximately 75% of AMOEBA clusters corresponded to predicted high risk, compared with ~64% of Gi* clusters. In general, AMOEBA predicted more irregularly shaped clusters of outbreaks in both livestock groups, while Gi* tended to predict larger, circular clusters. Here we provide an evaluation of both tests and a discussion of the use of each to detect environmental conditions associated with anthrax outbreak clusters in domestic livestock. These findings illustrate important differences in spatial statistical methods for defining local clusters and highlight the importance of selecting appropriate levels of data aggregation.
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
THE GROWTH POINTS OF STATISTICAL METHODS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2014-11-01
Full Text Available On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
Wieman, Carl E
2015-01-01
The paper "Evaluation of Colorado Learning Attitudes about Science Survey" [1] proposes a new, much shorter, version of the CLASS based on standard factor analysis. In this comment we explain why we believe the analysis that is used is inappropriate, and the proposed modified CLASS will be measuring something quite different, and less useful, than the original. The CLASS was based on extensive interviews with students and is intended to be a formative measurement of instruction that is probing a much more complex construct and with different goals than what is handled with classic psychometrics. We are writing this comment to reiterate the value of combining techniques of cognitive science with statistical analyses as described in detail in Adams & Wieman, 2011 [2] when developing a test of expert-like thinking for use in formative assessment. This type of approach is also called for by the National Research Council in a recent report [3].
Wells, Frank C.; Bourdon, Kristin C.
1985-01-01
Statistical and trend analyses of selected water-quality data collected at three streamflow stations in the lower Neches River basin, Texas, are summarized in order to document baseline water-quality conditions in stream segments that flow through the Big Thicket National Preserve in southeast Texas. Dissolved-solids concentrations in the streams are small, less than 132 milligrams per liter in 50 percent of the samples analyzed from each of the sites. Dissolved-oxygen concentrations in the Neches River at Evadale (08041000) generally are large, exceeding 8.0 milligrams per liter in more than 50 percent of the samples analyzed. Total nitrogen and total phosphorus concentrations in samples from this site have not exceeded 1.8 milligrams per liter and 0.20 milligram per liter, respectively.
Directory of Open Access Journals (Sweden)
Andreas H. Melcher
2012-09-01
Full Text Available This study analyses multidimensional spawning habitat suitability of the fish species “Nase” (latin: Chondrostoma nasus. This is the first time non-parametric methods were used to better understand biotic habitat use in theory and practice. In particular, we tested (1 the Decision Tree technique, Chi-squared Automatic Interaction Detectors (CHAID, to identify specific habitat types and (2 Prediction-Configural Frequency Analysis (P-CFA to test for statistical significance. The combination of both non-parametric methods, CHAID and P-CFA, enabled the identification, prediction and interpretation of most typical significant spawning habitats, and we were also able to determine non-typical habitat types, e.g., types in contrast to antitypes. The gradual combination of these two methods underlined three significant habitat types: shaded habitat, fine and coarse substrate habitat depending on high flow velocity. The study affirmed the importance for fish species of shading and riparian vegetation along river banks. In addition, this method provides a weighting of interactions between specific habitat characteristics. The results demonstrate that efficient river restoration requires re-establishing riparian vegetation as well as the open river continuum and hydro-morphological improvements to habitats.
Basic statistical tools in research and data analysis
Ali, Zulfiqar; Bhaskar, S Bala
2016-01-01
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Basic statistical tools in research and data analysis.
Ali, Zulfiqar; Bhaskar, S Bala
2016-09-01
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Basic statistical tools in research and data analysis
Directory of Open Access Journals (Sweden)
Zulfiqar Ali
2016-01-01
Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Saad, Walid; Poor, H Vincent; Başar, Tamer; Song, Ju Bin
2012-01-01
This paper introduces a novel approach that enables a number of cognitive radio devices that are observing the availability pattern of a number of primary users(PUs), to cooperate and use \\emph{Bayesian nonparametric} techniques to estimate the distributions of the PUs' activity pattern, assumed to be completely unknown. In the proposed model, each cognitive node may have its own individual view on each PU's distribution, and, hence, seeks to find partners having a correlated perception. To address this problem, a coalitional game is formulated between the cognitive devices and an algorithm for cooperative coalition formation is proposed. It is shown that the proposed coalition formation algorithm allows the cognitive nodes that are experiencing a similar behavior from some PUs to self-organize into disjoint, independent coalitions. Inside each coalition, the cooperative cognitive nodes use a combination of Bayesian nonparametric models such as the Dirichlet process and statistical goodness of fit techniques ...
Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space
Olhede, Sofia C
2010-01-01
High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging. Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function. Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and also analytically tractable. The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density. Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed. More subtle characteristics of white-matter microstructure, such as the degre...
All of statistics a concise course in statistical inference
Wasserman, Larry
2004-01-01
This book is for people who want to learn probability and statistics quickly It brings together many of the main ideas in modern statistics in one place The book is suitable for students and researchers in statistics, computer science, data mining and machine learning This book covers a much wider range of topics than a typical introductory text on mathematical statistics It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses The reader is assumed to know calculus and a little linear algebra No previous knowledge of probability and statistics is required The text can be used at the advanced undergraduate and graduate level Larry Wasserman is Professor of Statistics at Carnegie Mellon University He is also a member of the Center for Automated Learning and Discovery in the School of Computer Science His research areas include nonparametric inference, asymptotic theory, causality, and applications to astrophysics, bi...
Noise and speckle reduction in synthetic aperture radar imagery by nonparametric Wiener filtering.
Caprari, R S; Goh, A S; Moffatt, E K
2000-12-10
We present a Wiener filter that is especially suitable for speckle and noise reduction in multilook synthetic aperture radar (SAR) imagery. The proposed filter is nonparametric, not being based on parametrized analytical models of signal statistics. Instead, the Wiener-Hopf equation is expressed entirely in terms of observed signal statistics, with no reference to the possibly unobservable pure signal and noise. This Wiener filter is simple in concept and implementation, exactly minimum mean-square error, and directly applicable to signal-dependent and multiplicative noise. We demonstrate the filtering of a genuine two-look SAR image and show how a nonnegatively constrained version of the filter substantially reduces ringing.
A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.
Mircioiu, Constantin; Atkinson, Jeffrey
2017-05-10
A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.
Nonparametric methods in actigraphy: An update
Directory of Open Access Journals (Sweden)
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Cliff´s Delta Calculator: A non-parametric effect size program for two groups of observations
Directory of Open Access Journals (Sweden)
Guillermo Macbeth
2011-05-01
Full Text Available The Cliff´s Delta statistic is an effect size measure that quantifies the amount of difference between two non-parametric variables beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing. During the last two decades the use of effect size measures has been strongly encouraged by methodologists and leading institutions of behavioral sciences. The aim of this contribution is to introduce the Cliff´s Delta Calculator software that performs such analysis and offers some interpretation tips. Differences and similarities with the parametric case are analysed and illustrated. The implementation of this free program is fully described and compared with other calculators. Alternative algorithmic approaches are mathematically analysed and a basic linear algebra proof of its equivalence is formally presented. Two worked examples in cognitive psychology are commented. A visual interpretation of Cliff´s Delta is suggested. Availability, installation and applications of the program are presented and discussed.
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Nonparametric Bayesian drift estimation for multidimensional stochastic differential equations
Gugushvili, S.; Spreij, P.
2014-01-01
We consider nonparametric Bayesian estimation of the drift coefficient of a multidimensional stochastic differential equation from discrete-time observations on the solution of this equation. Under suitable regularity conditions, we establish posterior consistency in this context.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of
A non-parametric approach to investigating fish population dynamics
National Research Council Canada - National Science Library
Cook, R.M; Fryer, R.J
2001-01-01
.... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Energy Technology Data Exchange (ETDEWEB)
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
PV power forecast using a nonparametric PV model
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
Forecasting the AC power output of a PV plant accurately is important both for plant owners and electric system operators. Two main categories of PV modeling are available: the parametric and the nonparametric. In this paper, a methodology using a nonparametric PV model is proposed, using as inputs several forecasts of meteorological variables from a Numerical Weather Forecast model, and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quant...
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Directory of Open Access Journals (Sweden)
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Dai, Wenlin
2017-09-01
Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.
Bressan, L.; Tinti, S.
2016-01-01
This study presents a new method to analyse the properties of the sea-level signal recorded by coastal tide gauges in the long wave range that is in a window between wind/storm waves and tides and is typical of several phenomena like local seiches, coastal shelf resonances and tsunamis. The method consists of computing four specific functions based on the time gradient (slope) of the recorded sea level oscillations, namely the instantaneous slope (IS) as well as three more functions based on IS, namely the reconstructed sea level (RSL), the background slope (BS) and the control function (CF). These functions are examined through a traditional spectral fast Fourier transform (FFT) analysis and also through a statistical analysis, showing that they can be characterised by probability distribution functions PDFs such as the Student's t distribution (IS and RSL) and the beta distribution (CF). As an example, the method has been applied to data from the tide-gauge station of Siracusa, Italy.
Directory of Open Access Journals (Sweden)
Sergio A. Alvarado
2010-12-01
Full Text Available Objetivo: Evaluar la eficiencia predictiva de modelos estadísticos paramétricos y no paramétricos para predecir episodios críticos de contaminación por material particulado PM10 del día siguiente, que superen en Santiago de Chile la norma de calidad diaria. Una predicción adecuada de tales episodios permite a la autoridad decretar medidas restrictivas que aminoren la gravedad del episodio, y consecuentemente proteger la salud de la comunidad. Método: Se trabajó con las concentraciones de material particulado PM10 registradas en una estación asociada a la red de monitorización de la calidad del aire MACAM-2, considerando 152 observaciones diarias de 14 variables, y con información meteorológica registrada durante los años 2001 a 2004. Se ajustaron modelos estadísticos paramétricos Gamma usando el paquete estadístico STATA v11, y no paramétricos usando una demo del software estadístico MARS v 2.0 distribuida por Salford-Systems. Resultados: Ambos métodos de modelación presentan una alta correlación entre los valores observados y los predichos. Los modelos Gamma presentan mejores aciertos que MARS para las concentraciones de PM10 con valores Objective: To evaluate the predictive efficiency of two statistical models (one parametric and the other non-parametric to predict critical episodes of air pollution exceeding daily air quality standards in Santiago, Chile by using the next day PM10 maximum 24h value. Accurate prediction of such episodes would allow restrictive measures to be applied by health authorities to reduce their seriousness and protect the community´s health. Methods: We used the PM10 concentrations registered by a station of the Air Quality Monitoring Network (152 daily observations of 14 variables and meteorological information gathered from 2001 to 2004. To construct predictive models, we fitted a parametric Gamma model using STATA v11 software and a non-parametric MARS model by using a demo version of Salford
Zhu, Guangxu; Guo, Qingjun; Xiao, Huayun; Chen, Tongbin; Yang, Jun
2017-06-01
Heavy metals are considered toxic to humans and ecosystems. In the present study, heavy metal concentration in soil was investigated using the single pollution index (PIi), the integrated Nemerow pollution index (PIN), and the geoaccumulation index (Igeo) to determine metal accumulation and its pollution status at the abandoned site of the Capital Iron and Steel Factory in Beijing and its surrounding area. Multivariate statistical (principal component analysis and correlation analysis), geostatistical analysis (ArcGIS tool), combined with stable Pb isotopic ratios, were applied to explore the characteristics of heavy metal pollution and the possible sources of pollutants. The results indicated that heavy metal elements show different degrees of accumulation in the study area, the observed trend of the enrichment factors, and the geoaccumulation index was Hg > Cd > Zn > Cr > Pb > Cu ≈ As > Ni. Hg, Cd, Zn, and Cr were the dominant elements that influenced soil quality in the study area. The Nemerow index method indicated that all of the heavy metals caused serious pollution except Ni. Multivariate statistical analysis indicated that Cd, Zn, Cu, and Pb show obvious correlation and have higher loads on the same principal component, suggesting that they had the same sources, which are related to industrial activities and vehicle emissions. The spatial distribution maps based on ordinary kriging showed that high concentrations of heavy metals were located in the local factory area and in the southeast-northwest part of the study region, corresponding with the predominant wind directions. Analyses of lead isotopes confirmed that Pb in the study soils is predominantly derived from three Pb sources: dust generated during steel production, coal combustion, and the natural background. Moreover, the ternary mixture model based on lead isotope analysis indicates that lead in the study soils originates mainly from anthropogenic sources, which contribute much more
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Energy Technology Data Exchange (ETDEWEB)
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
Poage, J. L.
1975-01-01
A sequential nonparametric pattern classification procedure is presented. The method presented is an estimated version of the Wald sequential probability ratio test (SPRT). This method utilizes density function estimates, and the density estimate used is discussed, including a proof of convergence in probability of the estimate to the true density function. The classification procedure proposed makes use of the theory of order statistics, and estimates of the probabilities of misclassification are given. The procedure was tested on discriminating between two classes of Gaussian samples and on discriminating between two kinds of electroencephalogram (EEG) responses.
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
2010-04-01
OF EACH CELL ARE RESULTS OF KSVD AND BPFA, RESPECTIVELY. σ C.man House Peppers Lena Barbara Boats F.print Couple Hill 5 37.87 39.37 37.78 38.60 38.08...INTERPOLATION PSNR RESULTS, USING PATCH SIZE 8× 8. BOTTOM: BPFA RGB IMAGE INTERPOLATION PSNR RESULTS, USING PATCH SIZE 7× 7. data ratio C.man House Peppers Lena...of subspaces. IEEE Trans. Inform. Theory, 2009. [16] T. Ferguson . A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1:209–230
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
Mathematical statistics and stochastic processes
Bosq, Denis
2013-01-01
Generally, books on mathematical statistics are restricted to the case of independent identically distributed random variables. In this book however, both this case AND the case of dependent variables, i.e. statistics for discrete and continuous time processes, are studied. This second case is very important for today's practitioners.Mathematical Statistics and Stochastic Processes is based on decision theory and asymptotic statistics and contains up-to-date information on the relevant topics of theory of probability, estimation, confidence intervals, non-parametric statistics and rob
Statistical concepts a second course
Lomax, Richard G
2012-01-01
Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Lyons, L
2016-01-01
Accelerators and detectors are expensive, both in terms of money and human effort. It is thus important to invest effort in performing a good statistical anal- ysis of the data, in order to extract the best information from it. This series of five lectures deals with practical aspects of statistical issues that arise in typical High Energy Physics analyses.
Non-parametric combination and related permutation tests for neuroimaging.
Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E
2016-04-01
In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.
Power analyses for negative binomial models with application to multiple sclerosis clinical trials.
Rettiganti, Mallik; Nagaraja, H N
2012-01-01
We use negative binomial (NB) models for the magnetic resonance imaging (MRI)-based brain lesion count data from parallel group (PG) and baseline versus treatment (BVT) trials for relapsing remitting multiple sclerosis (RRMS) patients, and describe the associated likelihood ratio (LR), score, and Wald tests. We perform power analyses and sample size estimation using the simulated percentiles of the exact distribution of the test statistics for the PG and BVT trials. When compared to the corresponding nonparametric test, the LR test results in 30-45% reduction in sample sizes for the PG trials and 25-60% reduction for the BVT trials.
Recent Developments in Applied Probability and Statistics
Devroye, Luc; Kohler, Michael; Korn, Ralf
2010-01-01
This book presents surveys on recent developments in applied probability and statistics. The contributions include topics such as nonparametric regression and density estimation, option pricing, probabilistic methods for multivariate interpolation, robust graphical modelling and stochastic differential equations. Due to its broad coverage of different topics the book offers an excellent overview of recent developments in applied probability and statistics.
Microprocessors as an Adjunct to Statistics Instruction.
Miller, William G.
Examinations of costs and acquisition of facilities indicate that an Altair 8800A microcomputer with a program library of parametric, non-parametric, mathematical, and teaching programs can be used effectively for teaching college-level statistics. Statistical packages presently in use require extensive computing knowledge beyond the students' and…
Non-Parametric Bayesian Areal Linguistics
Daumé, Hal
2009-01-01
We describe a statistical model over linguistic areas and phylogeny. Our model recovers known areas and identifies a plausible hierarchy of areal features. The use of areas improves genetic reconstruction of languages both qualitatively and quantitatively according to a variety of metrics. We model linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's coalescent.
Nonparametric Bayesian Clustering of Structural Whole Brain Connectivity in Full Image Resolution
DEFF Research Database (Denmark)
Ambrosen, Karen Marie Sandø; Albers, Kristoffer Jon; Dyrby, Tim B.
2014-01-01
Diffusion magnetic resonance imaging enables measuring the structural connectivity of the human brain at a high spatial resolution. Local noisy connectivity estimates can be derived using tractography approaches and statistical models are necessary to quantify the brain’s salient structural...... organization. However, statistically modeling these massive structural connectivity datasets is a computational challenging task. We develop a high-performance inference procedure for the infinite relational model (a prominent non-parametric Bayesian model for clustering networks into structurally similar...... groups) that defines structural units at the resolution of statistical support. We apply the model to a network of structural brain connectivity in full image resolution with more than one hundred thousand regions (voxels in the gray-white matter boundary) and around one hundred million connections...
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Directory of Open Access Journals (Sweden)
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...
Nonparametric estimation of a convex bathtub-shaped hazard function.
Jankowski, Hanna K; Wellner, Jon A
2009-11-01
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n(2/5) at points x(0) where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....
Influence of test and person characteristics on nonparametric appropriateness measurement
Meijer, Rob R.; Molenaar, Ivo W.; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement
Meijer, Rob R; Molenaar, Ivo W; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects
Qian, Minghui; Hu, Ridong; Chen, Jianwei
2016-01-01
Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Uniform Consistency for Nonparametric Estimators in Null Recurrent Time Series
DEFF Research Database (Denmark)
Gao, Jiti; Kanaya, Shin; Li, Degui
2015-01-01
This paper establishes uniform consistency results for nonparametric kernel density and regression estimators when time series regressors concerned are nonstationary null recurrent Markov chains. Under suitable regularity conditions, we derive uniform convergence rates of the estimators. Our...... results can be viewed as a nonstationary extension of some well-known uniform consistency results for stationary time series....
Non-parametric Bayesian inference for inhomogeneous Markov point processes
DEFF Research Database (Denmark)
Berthelsen, Kasper Klitgaard; Møller, Jesper
With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...
Investigating the cultural patterns of corruption: A nonparametric analysis
Halkos, George; Tzeremes, Nickolaos
2011-01-01
By using a sample of 77 countries our analysis applies several nonparametric techniques in order to reveal the link between national culture and corruption. Based on Hofstede’s cultural dimensions and the corruption perception index, the results reveal that countries with higher levels of corruption tend to have higher power distance and collectivism values in their society.
Coverage Accuracy of Confidence Intervals in Nonparametric Regression
Institute of Scientific and Technical Information of China (English)
Song-xi Chen; Yong-song Qin
2003-01-01
Point-wise confidence intervals for a nonparametric regression function with random design points are considered. The confidence intervals are those based on the traditional normal approximation and the empirical likelihood. Their coverage accuracy is assessed by developing the Edgeworth expansions for the coverage probabilities. It is shown that the empirical likelihood confidence intervals are Bartlett correctable.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
A nonparametric approach for relevance determination
Shahbaba, Babak
2010-01-01
The problem of evaluating a large number of factors in terms of their relevance to an outcome of interest arises in many research areas such as genetics, image processing, astrophysics, and neuroscience. In this paper, we argue that treating such problems as large-scale hypothesis testing does not reflect the usual motivation behind these studies, which is to select a subset of promising factors for further investigation. and leads investigators to rely on arbitrary selection mechanisms (e.g., setting the false discovery rate at 0.05) or unrealistic loss functions. Moreover, while we might be able to justify simplifying assumptions (e.g., parametric distributional forms for test statistics under the null and alternative hypotheses) for classic hypothesis testing situations (i.e., one hypothesis at a time), generalizing such assumptions to large-scale studies is restrictive and unnecessary. In accordance with the objective of such studies, we propose to treat them as relevance determination problems. This way,...
Non-parametric change-point method for differential gene expression detection.
Directory of Open Access Journals (Sweden)
Yao Wang
Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.
Directory of Open Access Journals (Sweden)
Akhtar R. Siddique
2000-03-01
Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.
Nonparametric multivariate rank tests and their unbiasedness
Jurečková, Jana; 10.3150/10-BEJ326
2012-01-01
Although unbiasedness is a basic property of a good test, many tests on vector parameters or scalar parameters against two-sided alternatives are not finite-sample unbiased. This was already noticed by Sugiura [Ann. Inst. Statist. Math. 17 (1965) 261--263]; he found an alternative against which the Wilcoxon test is not unbiased. The problem is even more serious in multivariate models. When testing the hypothesis against an alternative which fits well with the experiment, it should be verified whether the power of the test under this alternative cannot be smaller than the significance level. Surprisingly, this serious problem is not frequently considered in the literature. The present paper considers the two-sample multivariate testing problem. We construct several rank tests which are finite-sample unbiased against a broad class of location/scale alternatives and are finite-sample distribution-free under the hypothesis and alternatives. Each of them is locally most powerful against a specific alternative of t...
Nonparametric regression with martingale increment errors
Delattre, Sylvain
2010-01-01
We consider the problem of adaptive estimation of the regression function in a framework where we replace ergodicity assumptions (such as independence or mixing) by another structural assumption on the model. Namely, we propose adaptive upper bounds for kernel estimators with data-driven bandwidth (Lepski's selection rule) in a regression model where the noise is an increment of martingale. It includes, as very particular cases, the usual i.i.d. regression and auto-regressive models. The cornerstone tool for this study is a new result for self-normalized martingales, called ``stability'', which is of independent interest. In a first part, we only use the martingale increment structure of the noise. We give an adaptive upper bound using a random rate, that involves the occupation time near the estimation point. Thanks to this approach, the theoretical study of the statistical procedure is disconnected from usual ergodicity properties like mixing. Then, in a second part, we make a link with the usual minimax th...
Comparison of Rank Analysis of Covariance and Nonparametric Randomized Blocks Analysis.
Porter, Andrew C.; McSweeney, Maryellen
The relative power of three possible experimental designs under the condition that data is to be analyzed by nonparametric techniques; the comparison of the power of each nonparametric technique to its parametric analogue; and the comparison of relative powers using nonparametric and parametric techniques are discussed. The three nonparametric…
Directory of Open Access Journals (Sweden)
Kosiorowska Ewa
2015-06-01
Full Text Available We briefly communicate the results of nonparametric and robust evaluation of the effects of the Fourth Millennium Development Goal of the United Nations. The main aim of the goal was reducing by two thirds, from 1990-2015, under five month’s child mortality. Our novel analysis was conducted by means of very powerful and user friendly tools offered by the Data Depth Concept being a collection of multivariate techniques basing on multivariate generalizations of quintiles, ranges and order statistics. The results of our analysis are more convincing than the results obtained using classical statistical tools.
Directory of Open Access Journals (Sweden)
L. Bressan
2016-01-01
reconstructed sea level (RSL, the background slope (BS and the control function (CF. These functions are examined through a traditional spectral fast Fourier transform (FFT analysis and also through a statistical analysis, showing that they can be characterised by probability distribution functions PDFs such as the Student's t distribution (IS and RSL and the beta distribution (CF. As an example, the method has been applied to data from the tide-gauge station of Siracusa, Italy.
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Macmillan, N A; Creelman, C D
1996-06-01
Can accuracy and response bias in two-stimulus, two-response recognition or detection experiments be measured nonparametrically? Pollack and Norman (1964) answered this question affirmatively for sensitivity, Hodos (1970) for bias: Both proposed measures based on triangular areas in receiver-operating characteristic space. Their papers, and especially a paper by Grier (1971) that provided computing formulas for the measures, continue to be heavily cited in a wide range of content areas. In our sample of articles, most authors described triangle-based measures as making fewer assumptions than measures associated with detection theory. However, we show that statistics based on products or ratios of right triangle areas, including a recently proposed bias index and a not-yetproposed but apparently plausible sensitivity index, are consistent with a decision process based on logistic distributions. Even the Pollack and Norman measure, which is based on non-right triangles, is approximately logistic for low values of sensitivity. Simple geometric models for sensitivity and bias are not nonparametric, even if their implications are not acknowledged in the defining publications.
Directory of Open Access Journals (Sweden)
Ismet DOGAN
2015-10-01
Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.
Nonparametric inference procedures for multistate life table analysis.
Dow, M M
1985-01-01
Recent generalizations of the classical single state life table procedures to the multistate case provide the means to analyze simultaneously the mobility and mortality experience of 1 or more cohorts. This paper examines fairly general nonparametric combinatorial matrix procedures, known as quadratic assignment, as an analysis technic of various transitional patterns commonly generated by cohorts over the life cycle course. To some degree, the output from a multistate life table analysis suggests inference procedures. In his discussion of multstate life table construction features, the author focuses on the matrix formulation of the problem. He then presents several examples of the proposed nonparametric procedures. Data for the mobility and life expectancies at birth matrices come from the 458 member Cayo Santiago rhesus monkey colony. The author's matrix combinatorial approach to hypotheses testing may prove to be a useful inferential strategy in several multidimensional demographic areas.
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Combined parametric-nonparametric identification of block-oriented systems
Mzyk, Grzegorz
2014-01-01
This book considers a problem of block-oriented nonlinear dynamic system identification in the presence of random disturbances. This class of systems includes various interconnections of linear dynamic blocks and static nonlinear elements, e.g., Hammerstein system, Wiener system, Wiener-Hammerstein ("sandwich") system and additive NARMAX systems with feedback. Interconnecting signals are not accessible for measurement. The combined parametric-nonparametric algorithms, proposed in the book, can be selected dependently on the prior knowledge of the system and signals. Most of them are based on the decomposition of the complex system identification task into simpler local sub-problems by using non-parametric (kernel or orthogonal) regression estimation. In the parametric stage, the generalized least squares or the instrumental variables technique is commonly applied to cope with correlated excitations. Limit properties of the algorithms have been shown analytically and illustrated in simple experiments.
Estimation of Stochastic Volatility Models by Nonparametric Filtering
DEFF Research Database (Denmark)
Kanaya, Shin; Kristensen, Dennis
2016-01-01
/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes
Directory of Open Access Journals (Sweden)
Biqing Cai
2015-04-01
Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
-Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...
Poverty and life cycle effects: A nonparametric analysis for Germany
Stich, Andreas
1996-01-01
Most empirical studies on poverty consider the extent of poverty either for the entire society or for separate groups like elderly people.However, these papers do not show what the situation looks like for persons of a certain age. In this paper poverty measures depending on age are derived using the joint density of income and age. The density is nonparametrically estimated by weighted Gaussian kernel density estimation. Applying the conditional density of income to several poverty measures ...
Fusion of Hard and Soft Information in Nonparametric Density Estimation
2015-06-10
estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample...Fusion of Hard and Soft Information in Nonparametric Density Estimation∗ Johannes O. Royset Roger J-B Wets Department of Operations Research...univariate density estimation in situations when the sample ( hard information) is supplemented by “soft” information about the random phenomenon. These
Nonparametric estimation for hazard rate monotonously decreasing system
Institute of Scientific and Technical Information of China (English)
Han Fengyan; Li Weisong
2005-01-01
Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.
Non-parametric versus parametric methods in environmental sciences
Directory of Open Access Journals (Sweden)
Muhammad Riaz
2016-01-01
Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.
Directory of Open Access Journals (Sweden)
Kovačević Miladin
2005-01-01
Full Text Available In the political and war crisis which embraced Bosnia and Herzegovina in the spring of 1992 with an end of war hostilities in the autumn of 1995 when the "Dayton Peace Agreement" emerged (November 1995, a media war occurred. From the very beginning, this war had an international character. The question on the number of war victims (killed and missing "exploded" in June of 1993 when Haris Silajdžić stated that there had been 200000 dead among the Muslims. This figure uncritically became the basis for all later media and local "empirical truths" on the number of victims. All statistical and demographic disciplines were exploited to support, if not prove, the propaganda standpoints. Objectivity was oppressed by an ugly "face of the war". Having in mind the experience of the Second World War in Yugoslavia the question on the number of victims does not cease to be topical for decades after the end of the war. Bosnia and Herzegovina is more than a confirmation. This question seems to intervene (and in a way "feed of" with the most difficult political and international questions and court trials. ("International Court of Justice", indictment of Bosnia and Herzegovina against The Federal Republic of Yugoslavia, namely Serbia. The methodological analysis of the most important works which deal with the question of the number of victims in the Bosnian war (above all, those done by Bosnian institutes and authors indicate the "mistakes" made by the character of these works (propaganda. The manipulation with statistical methods and numbers is not new. Methodological and numerical traps can slip even to the most informed. The use of statistics and social science in court trials seems to show Janus's face of science: on one side the authentic "moral passion" of researchers finds great sense, and on the other side special interests strive to impose themselves through the (most refined instrumentation of science and knowledge. (The example of Mr. Patrick Ball
Sand, F.; Christie, R.
1975-01-01
Extending the crop survey application of remote sensing from small experimental regions to state and national levels requires that a sample of agricultural fields be chosen for remote sensing of crop acreage, and that a statistical estimate be formulated with measurable characteristics. The critical requirements for the success of the application are reviewed in this report. The problem of sampling in the presence of cloud cover is discussed. Integration of remotely sensed information about crops into current agricultural crop forecasting systems is treated on the basis of the USDA multiple frame survey concepts, with an assumed addition of a new frame derived from remote sensing. Evolution of a crop forecasting system which utilizes LANDSAT and future remote sensing systems is projected for the 1975-1990 time frame.
Statistics 101 for Radiologists.
Anvari, Arash; Halpern, Elkan F; Samir, Anthony E
2015-10-01
Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced.
Energy Technology Data Exchange (ETDEWEB)
Ford, Eric B.; /Florida U.; Fabrycky, Daniel C.; /Lick Observ.; Steffen, Jason H.; /Fermilab; Carter, Joshua A.; /Harvard-Smithsonian Ctr. Astrophys.; Fressin, Francois; /Harvard-Smithsonian Ctr. Astrophys.; Holman, Matthew J.; /Harvard-Smithsonian Ctr. Astrophys.; Lissauer, Jack J.; /NASA, Ames; Moorhead, Althea V.; /Florida U.; Morehead, Robert C.; /Florida U.; Ragozzine, Darin; /Harvard-Smithsonian Ctr. Astrophys.; Rowe, Jason F.; /NASA, Ames /SETI Inst., Mtn. View /San Diego State U., Astron. Dept.
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timing variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:9 and 2:4:6:9 period commensurabilities. Our results demonstrate that TTVs provide a powerful tool for confirming transiting planets, including low-mass planets and planets around faint stars for which Doppler follow-up is not practical with existing facilities. Continued Kepler observations will dramatically improve the constraints on the planet masses and orbits and provide sensitivity for detecting additional non-transiting planets. If Kepler observations were extended to eight years, then a similar analysis could likely confirm systems with multiple closely spaced, small transiting planets in or near the habitable zone of solar-type stars.
Li, Ming; Gardiner, Joseph C; Breslau, Naomi; Anthony, James C; Lu, Qing
2014-07-01
Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.
Non-parametric star formation histories for 5 dwarf spheroidal galaxies of the local group
Hernández, X; Valls-Gabaud, D; Gilmore, Gerard; Valls-Gabaud, David
2000-01-01
We use recent HST colour-magnitude diagrams of the resolved stellar populations of a sample of local dSph galaxies (Carina, LeoI, LeoII, Ursa Minor and Draco) to infer the star formation histories of these systems, $SFR(t)$. Applying a new variational calculus maximum likelihood method which includes a full Bayesian analysis and allows a non-parametric estimate of the function one is solving for, we infer the star formation histories of the systems studied. This method has the advantage of yielding an objective answer, as one need not assume {\\it a priori} the form of the function one is trying to recover. The results are checked independently using Saha's $W$ statistic. The total luminosities of the systems are used to normalize the results into physical units and derive SN type II rates. We derive the luminosity weighted mean star formation history of this sample of galaxies.
Nonparametric test of consistency between cosmological models and multiband CMB measurements
Aghamousa, Amir
2015-01-01
We present a novel approach to test the consistency of the cosmological models with multiband CMB data using a nonparametric approach. In our analysis we calibrate the REACT (Risk Estimation and Adaptation after Coordinate Transformation) confidence levels associated with distances in function space (confidence distances) based on the Monte Carlo simulations in order to test the consistency of an assumed cosmological model with observation. To show the applicability of our algorithm, we confront Planck 2013 temperature data with concordance model of cosmology considering two different Planck spectra combination. In order to have an accurate quantitative statistical measure to compare between the data and the theoretical expectations, we calibrate REACT confidence distances and perform a bias control using many realizations of the data. Our results in this work using Planck 2013 temperature data put the best fit $\\Lambda$CDM model at $95\\% (\\sim 2\\sigma)$ confidence distance from the center of the nonparametri...
Assessing T cell clonal size distribution: a non-parametric approach.
Bolkhovskaya, Olesya V; Zorin, Daniil Yu; Ivanchenko, Mikhail V
2014-01-01
Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.
Assessing T cell clonal size distribution: a non-parametric approach.
Directory of Open Access Journals (Sweden)
Olesya V Bolkhovskaya
Full Text Available Clonal structure of the human peripheral T-cell repertoire is shaped by a number of homeostatic mechanisms, including antigen presentation, cytokine and cell regulation. Its accurate tuning leads to a remarkable ability to combat pathogens in all their variety, while systemic failures may lead to severe consequences like autoimmune diseases. Here we develop and make use of a non-parametric statistical approach to assess T cell clonal size distributions from recent next generation sequencing data. For 41 healthy individuals and a patient with ankylosing spondylitis, who undergone treatment, we invariably find power law scaling over several decades and for the first time calculate quantitatively meaningful values of decay exponent. It has proved to be much the same among healthy donors, significantly different for an autoimmune patient before the therapy, and converging towards a typical value afterwards. We discuss implications of the findings for theoretical understanding and mathematical modeling of adaptive immunity.
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
This paper presents a method for correction and alignment of global radiation observations based on information obtained from calculated global radiation, in the present study one-hour forecast of global radiation from a numerical weather prediction (NWP) model is used. Systematical errors detected...... in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...... University. The method can be useful for optimized use of solar radiation observations for forecasting, monitoring, and modeling of energy production and load which are affected by solar radiation....
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Nonparametric analysis of the time structure of seismicity in a geographic region
Directory of Open Access Journals (Sweden)
A. Quintela-del-Río
2002-06-01
Full Text Available As an alternative to traditional parametric approaches, we suggest nonparametric methods for analyzing temporal data on earthquake occurrences. In particular, the kernel method for estimating the hazard function and the intensity function are presented. One novelty of our approaches is that we take into account the possible dependence of the data to estimate the distribution of time intervals between earthquakes, which has not been considered in most statistics studies on seismicity. Kernel estimation of hazard function has been used to study the occurrence process of cluster centers (main shocks. Kernel intensity estimation, on the other hand, has helped to describe the occurrence process of cluster members (aftershocks. Similar studies in two geographic areas of Spain (Granada and Galicia have been carried out to illustrate the estimation methods suggested.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
DEFF Research Database (Denmark)
Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda
2016-01-01
Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...
Non-parametric probabilistic forecasts of wind power: required properties and evaluation
DEFF Research Database (Denmark)
Pinson, Pierre; Nielsen, Henrik Aalborg; Møller, Jan Kloppenborg;
2007-01-01
of the conditional expectation of future generation for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form...... of a single or a set of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates...
STATISTICAL ANALYSES OF AE DATA FROM AIRCRAFT%机体结构AE信号的统计分析研究
Institute of Scientific and Technical Information of China (English)
张凤林; 韩维; 胡国才; 李子尚
2001-01-01
对机体结构声发射数据的表征参数进行分析，建立了数据处理的统计模型，并运用该模型分析了某型飞机全机疲劳试验和某现役飞机随机机载声发射监测数据，提出了结构损伤判据，可供实际应用。%AE technique can be used to monitor the formation and development of fatigue cracks in steel structure dynamically and continuously. In this paper, a mathematical statistical model is established by analyzing characteristic parameters of AE data from aircraft. By using it, the authors make a study of AE data from an aircraft during its fatigue test and a fighter structure during its flight. On this basis, a practical damage standard is advanced, which can be applied to judge whether cracks are being formed or developed in an aircraft under given probability. At the end of the paper, the research orientations are given to the establishment of a more general standard.
Kauweloa, Kevin I; Gutierrez, Alonso N; Stathakis, Sotirios; Papanikolaou, Niko; Mavroidis, Panayiotis
2016-07-01
A toolkit has been developed for calculating the 3-dimensional biological effective dose (BED) distributions in multi-phase, external beam radiotherapy treatments such as those applied in liver stereotactic body radiation therapy (SBRT) and in multi-prescription treatments. This toolkit also provides a wide range of statistical results related to dose and BED distributions. MATLAB 2010a, version 7.10 was used to create this GUI toolkit. The input data consist of the dose distribution matrices, organ contour coordinates, and treatment planning parameters from the treatment planning system (TPS). The toolkit has the capability of calculating the multi-phase BED distributions using different formulas (denoted as true and approximate). Following the calculations of the BED distributions, the dose and BED distributions can be viewed in different projections (e.g. coronal, sagittal and transverse). The different elements of this toolkit are presented and the important steps for the execution of its calculations are illustrated. The toolkit is applied on brain, head & neck and prostate cancer patients, who received primary and boost phases in order to demonstrate its capability in calculating BED distributions, as well as measuring the inaccuracy and imprecision of the approximate BED distributions. Finally, the clinical situations in which the use of the present toolkit would have a significant clinical impact are indicated.
Directory of Open Access Journals (Sweden)
Emmanuel Jaspard
Full Text Available Late Embryogenesis Abundant Proteins (LEAPs are ubiquitous proteins expected to play major roles in desiccation tolerance. Little is known about their structure - function relationships because of the scarcity of 3-D structures for LEAPs. The previous building of LEAPdb, a database dedicated to LEAPs from plants and other organisms, led to the classification of 710 LEAPs into 12 non-overlapping classes with distinct properties. Using this resource, numerous physico-chemical properties of LEAPs and amino acid usage by LEAPs have been computed and statistically analyzed, revealing distinctive features for each class. This unprecedented analysis allowed a rigorous characterization of the 12 LEAP classes, which differed also in multiple structural and physico-chemical features. Although most LEAPs can be predicted as intrinsically disordered proteins, the analysis indicates that LEAP class 7 (PF03168 and probably LEAP class 11 (PF04927 are natively folded proteins. This study thus provides a detailed description of the structural properties of this protein family opening the path toward further LEAP structure - function analysis. Finally, since each LEAP class can be clearly characterized by a unique set of physico-chemical properties, this will allow development of software to predict proteins as LEAPs.
Roth, Amanda L; Hanson, Nancy D
2013-01-01
In the United States, the production of the Klebsiella pneumoniae carbapenemase (KPC) is an important mechanism of carbapenem resistance in Gram-negative pathogens. Infections with KPC-producing organisms are associated with increased morbidity and mortality; therefore, the rapid detection of KPC-producing pathogens is critical in patient care and infection control. We developed a real-time PCR assay complemented with traditional high-resolution melting (HRM) analysis, as well as statistically based genotyping, using the Rotor-Gene ScreenClust HRM software to both detect the presence of bla(KPC) and differentiate between KPC-2-like and KPC-3-like alleles. A total of 166 clinical isolates of Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter baumannii with various β-lactamase susceptibility patterns were tested in the validation of this assay; 66 of these organisms were known to produce the KPC β-lactamase. The real-time PCR assay was able to detect the presence of bla(KPC) in all 66 of these clinical isolates (100% sensitivity and specificity). HRM analysis demonstrated that 26 had KPC-2-like melting peak temperatures, while 40 had KPC-3-like melting peak temperatures. Sequencing of 21 amplified products confirmed the melting peak results, with 9 isolates carrying bla(KPC-2) and 12 isolates carrying bla(KPC-3). This PCR/HRM assay can identify KPC-producing Gram-negative pathogens in as little as 3 h after isolation of pure colonies and does not require post-PCR sample manipulation for HRM analysis, and ScreenClust analysis easily distinguishes bla(KPC-2-like) and bla(KPC-3-like) alleles. Therefore, this assay is a rapid method to identify the presence of bla(KPC) enzymes in Gram-negative pathogens that can be easily integrated into busy clinical microbiology laboratories.
Kopecky, Kenneth J; Davis, Scott; Hamilton, Thomas E; Saporito, Mark S; Onstad, Lynn E
2004-07-01
Residents of eastern Washington, northeastern Oregon, and western Idaho were exposed to I released into the atmosphere from operations at the Hanford Nuclear Site from 1944 through 1972, especially in the late 1940's and early 1950's. This paper describes the estimated doses to the thyroid glands of the 3,440 evaluable participants in the Hanford Thyroid Disease Study, which investigated whether thyroid morbidity was increased in people exposed to radioactive iodine from Hanford during 1944-1957. The participants were born during 1940-1946 to mothers living in Benton, Franklin, Walla Walla, Adams, Okanogan, Ferry, or Stevens Counties in Washington State. Whenever possible someone with direct knowledge of the participant's early life (preferably the participant's mother) was interviewed about the participant's individual dose-determining characteristics (residence history, sources and quantities of food, milk, and milk products consumed, production and processing techniques for home-grown food and milk products). Default information was used if no interview respondent was available. Thyroid doses were estimated using the computer program Calculation of Individual Doses from Environmental Radionuclides (CIDER) developed by the Hanford Environmental Dose Reconstruction Project. CIDER provided 100 sets of doses to represent uncertainty of the estimates. These sets were not generated independently for each participant, but reflected the effects of uncertainties in characteristics shared by participants. Estimated doses (medians of each participant's 100 realizations) ranged from 0.0029 mGy to 2823 mGy, with mean and median of 174 and 97 mGy, respectively. The distribution of estimated doses provided the Hanford Thyroid Disease Study with sufficient statistical power to test for dose-response relationships between thyroid outcomes and exposure to Hanford's I.
Kowalski, Jeanne
2008-01-01
A timely and applied approach to the newly discovered methods and applications of U-statisticsBuilt on years of collaborative research and academic experience, Modern Applied U-Statistics successfully presents a thorough introduction to the theory of U-statistics using in-depth examples and applications that address contemporary areas of study including biomedical and psychosocial research. Utilizing a "learn by example" approach, this book provides an accessible, yet in-depth, treatment of U-statistics, as well as addresses key concepts in asymptotic theory by integrating translational and cross-disciplinary research.The authors begin with an introduction of the essential and theoretical foundations of U-statistics such as the notion of convergence in probability and distribution, basic convergence results, stochastic Os, inference theory, generalized estimating equations, as well as the definition and asymptotic properties of U-statistics. With an emphasis on nonparametric applications when and where applic...
Statistics and finance an introduction
Ruppert, David
2004-01-01
This textbook emphasizes the applications of statistics and probability to finance. Students are assumed to have had a prior course in statistics, but no background in finance or economics. The basics of probability and statistics are reviewed and more advanced topics in statistics, such as regression, ARMA and GARCH models, the bootstrap, and nonparametric regression using splines, are introduced as needed. The book covers the classical methods of finance such as portfolio theory, CAPM, and the Black-Scholes formula, and it introduces the somewhat newer area of behavioral finance. Applications and use of MATLAB and SAS software are stressed. The book will serve as a text in courses aimed at advanced undergraduates and masters students in statistics, engineering, and applied mathematics as well as quantitatively oriented MBA students. Those in the finance industry wishing to know more statistics could also use it for self-study. David Ruppert is the Andrew Schultz, Jr. Professor of Engineering, School of Oper...
Energy Technology Data Exchange (ETDEWEB)
Peterson, James T.
1999-12-01
Natural resource professionals are increasingly required to develop rigorous statistical models that relate environmental data to categorical responses data. Recent advances in the statistical and computing sciences have led to the development of sophisticated methods for parametric and nonparametric analysis of data with categorical responses. The statistical software package CATDAT was designed to make some of these relatively new and powerful techniques available to scientists. The CATDAT statistical package includes 4 analytical techniques: generalized logit modeling; binary classification tree; extended K-nearest neighbor classification; and modular neural network.
Directory of Open Access Journals (Sweden)
A. Francke
2013-01-01
Full Text Available Lake El'gygytgyn, located in the Far East Russian Arctic, was formed by a meteorite impact about 3.58 Ma ago. In 2009, the ICDP Lake El'gygytgyn Drilling Project obtained a continuous sediment sequence of the lacustrine deposits and the upper part of the impact breccia. Here, we present grain-size data of the past 2.6 Ma. General downcore grain-size variations yield coarser sediments during warm periods and finer ones during cold periods. According to Principal Component Analyses (PCA, the climate-dependent variations in grain-size distributions mainly occur in the coarse silt and very fine silt fraction. During interglacial periods, accumulation of coarser grain sizes in the lake center is supposed to be caused by redistribution of clastic material by a wind-induced current pattern during the ice-free period. Sediment supply to the lake is triggered by the thickness of the active layer in the catchment, and the availability of water as transport medium. During glacial periods, sedimentation at Lake El'gygytgyn is hampered by the occurrence of a perennial ice-cover with sedimentation being restricted to seasonal moats and vertical conducts through the ice. Thus, the summer temperature predominantly triggers transport of coarse material into the lake center. Time series analysis that was carried out to gain insight in the frequency of the grain-size data showed grain-size variations predominately on Milankovitch's eccentricity, obliquity and precession bands. Variations in the relative power of these three oscillation bands during the Quaternary imply that climate conditions at Lake El'gygytgyn are mainly triggered by global glacial/interglacial variations (eccentricity, obliquity and local insolation forcing (precession, respectively.
Crainiceanu, Ciprian M; Caffo, Brian S; Di, Chong-Zhi; Punjabi, Naresh M
2009-06-01
We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS.
Miller, Lisa D.; Long, Andrew J.
2006-01-01
In 2003 the U.S. Geological Survey, in cooperation with the San Antonio Water System, did a study using historical data to statistically analyze hydrologic system components in the San Antonio region of Texas and to develop transfer-function models to simulate water levels at selected sites (wells) in the Edwards aquifer on the basis of rainfall. Water levels for two wells in the confined zone in Medina County and one well in the confined zone in Bexar County were highly correlated and showed little or no lag time between water-level responses. Water levels in these wells also were highly correlated with springflow at Comal Springs. Water-level hydrographs for 35 storms showed that an individual well can respond differently to similar amounts of rainfall. Fourteen water-level-recession hydrographs for a Medina County well showed that recession rates were variable. Transfer-function models were developed to simulate water levels at one confined-zone well and two recharge-zone wells in response to rainfall. For the confined-zone well, 50 percent of the simulated water levels are within 10 feet of the measured water levels, and 80 percent of the simulated water levels are within 15 feet of the measured water levels. For one recharge-zone well, 50 percent of the simulated water levels are within 5 feet of the measured water levels, and 90 percent of the simulated water levels are within 14 feet of the measured water levels. For the other recharge-zone well, 50 percent of the simulated water levels are within 14 feet of the measured water levels, and 90 percent of the simulated water levels are within 27 feet of the measured water levels. The transfer-function models showed that (1) the Edwards aquifer in the San Antonio region responds differently to recharge (effective rainfall) at different wells; and (2) multiple flow components are present in the aquifer. If simulated long-term system response results from a change in the hydrologic budget, then water levels would
Ibáñez, J. M.; De Angelis, S.; Díaz-Moreno, A.; Hernández, P.; Alguacil, G.; Posadas, A.; Pérez, N.
2012-08-01
The purpose of this work is to gain insights into the 2011-2012 eruption of El Hierro (Canary Islands) by mapping the evolution of the seismic b-value. The El Hierro seismic sequence offers a rather unique opportunity to investigate the process of reawakening of an oceanic intraplate volcano after a long period of repose. The 2011-2012 eruption is a submarine volcanic event that took place about 2 km off of the southern coast of El Hierro. The eruption was accompanied by an intense seismic swarm and surface manifestations of activity. The earthquake catalogue during the period of unrest includes over 12 000 events, the largest with magnitude 4.6. The seismic sequence can be grouped into three distinct phases, which correspond to well-separated spatial clusters and distinct earthquake regimes. The estimated b-value is of 1.18 ± 0.03, and a magnitude of completeness of 1.3, for the entire catalogue. B is very close to 1.0, which indicates completeness of the earthquake catalogue with only minor departures from the linearity of Gutenberg-Richter frequency-magnitude distribution. The most straightforward interpretation of this result is that the seismic swarm reached its final stages, and no additional large magnitude events should be anticipated, similarly to what one would expect for non-volcanic earthquake sequences. The results, dividing the activity in different phases, illustrate remarkable differences in the estimate of b-value during the early and late stages of the eruption. The early pre-eruptive activity was characterized by a b-value of 2.25. In contrast, the b-value was 1.25 during the eruptive phase. Based on our analyses, and the results of other studies, we propose a scenario that may account for the observations reported in this work. We infer that the earthquakes that occurred in the first phase reflect magma migration from the upper mantle to crustal depths. The area where magma initially intruded into the crust, because of its transitional nature
Panel data nonparametric estimation of production risk and risk preferences
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...... approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price...
Digital spectral analysis parametric, non-parametric and advanced methods
Castanié, Francis
2013-01-01
Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a
Testing for a constant coefficient of variation in nonparametric regression
Dette, Holger; Marchlewski, Mareen; Wagener, Jens
2010-01-01
In the common nonparametric regression model Y_i=m(X_i)+sigma(X_i)epsilon_i we consider the problem of testing the hypothesis that the coefficient of the scale and location function is constant. The test is based on a comparison of the observations Y_i=\\hat{sigma}(X_i) with their mean by a smoothed empirical process, where \\hat{sigma} denotes the local linear estimate of the scale function. We show weak convergence of a centered version of this process to a Gaussian process under the null ...
Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing
DEFF Research Database (Denmark)
Hald, Ditte Høvenhoff
The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...
Statistical intervals a guide for practitioners
Hahn, Gerald J
2011-01-01
Presents a detailed exposition of statistical intervals and emphasizes applications in industry. The discussion differentiates at an elementary level among different kinds of statistical intervals and gives instruction with numerous examples and simple math on how to construct such intervals from sample data. This includes confidence intervals to contain a population percentile, confidence intervals on probability of meeting specified threshold value, and prediction intervals to include observation in a future sample. Also has an appendix containing computer subroutines for nonparametric stati
Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F
2002-06-01
Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.
Non-parametric three-way mixed ANOVA with aligned rank tests.
Oliver-Rodríguez, Juan C; Wang, X T
2015-02-01
Research problems that require a non-parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two-way interaction is proposed for the analysis of the typical sources of variation in a three-way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non-normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non-normal distributions and large sample sizes. Degrees-of-freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
Lee, Kyungbook; Song, Seok Goo
2016-10-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events (M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods
Directory of Open Access Journals (Sweden)
Pedro Carvalho
2016-03-01
Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
1986-09-01
military police, etc.), and service support arms (finance, legal , mcdical. etc.) and to take a random sample of all the -1",0 - 67 ,0"o local enlistments...1 , ." . .- .- . ." . " - -" - -’ ’-- - . ’ ’. . ,,’ ,,., .- " -. ’ ’ Would you use marihuana ? (AFTER FILMS) (BEFORE FILMS) Yes No Yes 18 12 No 8
Bag, Soumen; Sen, Prithwiraj; Sanyal, Gautam
2011-01-01
Facial expressions convey non-verbal cues, which play an important role in interpersonal relations. Automatic recognition of human face based on facial expression can be an important component of natural human-machine interface. It may also be used in behavioural science. Although human can recognize the face practically without any effort, but reliable face recognition by machine is a challenge. This paper presents a new approach for recognizing the face of a person considering the expressions of the same human face at different instances of time. This methodology is developed combining Eigenface method for feature extraction and modified k-Means clustering for identification of the human face. This method endowed the face recognition without using the conventional distance measure classifiers. Simulation results show that proposed face recognition using perception of k-Means clustering is useful for face images with different facial expressions.
DEFF Research Database (Denmark)
Effraimidis, Georgios; Dahl, Christian Møller
In this paper, we develop a fully nonparametric approach for the estimation of the cumulative incidence function with Missing At Random right-censored competing risks data. We obtain results on the pointwise asymptotic normality as well as the uniform convergence rate of the proposed nonparametric...... estimator. A simulation study that serves two purposes is provided. First, it illustrates in details how to implement our proposed nonparametric estimator. Secondly, it facilitates a comparison of the nonparametric estimator to a parametric counterpart based on the estimator of Lu and Liang (2008...
Directory of Open Access Journals (Sweden)
Ben J. Callahan
2016-06-01
Full Text Available High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or microbial composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, including both parameteric and nonparametric methods. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests, partial least squares and linear models as well as nonparametric testing using community networks and the ggnetwork package.
Directory of Open Access Journals (Sweden)
Ben J. Callahan
2016-11-01
Full Text Available High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package.
Carroll, Raymond J.
2011-03-01
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
Takara, K. T.
2015-12-01
This paper describes a non-parametric frequency analysis method for hydrological extreme-value samples with a size larger than 100, verifying the estimation accuracy with a computer intensive statistics (CIS) resampling such as the bootstrap. Probable maximum values are also incorporated into the analysis for extreme events larger than a design level of flood control. Traditional parametric frequency analysis methods of extreme values include the following steps: Step 1: Collecting and checking extreme-value data; Step 2: Enumerating probability distributions that would be fitted well to the data; Step 3: Parameter estimation; Step 4: Testing goodness of fit; Step 5: Checking the variability of quantile (T-year event) estimates by the jackknife resampling method; and Step_6: Selection of the best distribution (final model). The non-parametric method (NPM) proposed here can skip Steps 2, 3, 4 and 6. Comparing traditional parameter methods (PM) with the NPM, this paper shows that PM often underestimates 100-year quantiles for annual maximum rainfall samples with records of more than 100 years. Overestimation examples are also demonstrated. The bootstrap resampling can do bias correction for the NPM and can also give the estimation accuracy as the bootstrap standard error. This NPM has advantages to avoid various difficulties in above-mentioned steps in the traditional PM. Probable maximum events are also incorporated into the NPM as an upper bound of the hydrological variable. Probable maximum precipitation (PMP) and probable maximum flood (PMF) can be a new parameter value combined with the NPM. An idea how to incorporate these values into frequency analysis is proposed for better management of disasters that exceed the design level. The idea stimulates more integrated approach by geoscientists and statisticians as well as encourages practitioners to consider the worst cases of disasters in their disaster management planning and practices.
Bayesian Nonparametric Mixture Estimation for Time-Indexed Functional Data in R
Directory of Open Access Journals (Sweden)
Terrance D. Savitsky
2016-08-01
Full Text Available We present growfunctions for R that offers Bayesian nonparametric estimation models for analysis of dependent, noisy time series data indexed by a collection of domains. This data structure arises from combining periodically published government survey statistics, such as are reported in the Current Population Study (CPS. The CPS publishes monthly, by-state estimates of employment levels, where each state expresses a noisy time series. Published state-level estimates from the CPS are composed from household survey responses in a model-free manner and express high levels of volatility due to insufficient sample sizes. Existing software solutions borrow information over a modeled time-based dependence to extract a de-noised time series for each domain. These solutions, however, ignore the dependence among the domains that may be additionally leveraged to improve estimation efficiency. The growfunctions package offers two fully nonparametric mixture models that simultaneously estimate both a time and domain-indexed dependence structure for a collection of time series: (1 A Gaussian process (GP construction, which is parameterized through the covariance matrix, estimates a latent function for each domain. The covariance parameters of the latent functions are indexed by domain under a Dirichlet process prior that permits estimation of the dependence among functions across the domains: (2 An intrinsic Gaussian Markov random field prior construction provides an alternative to the GP that expresses different computation and estimation properties. In addition to performing denoised estimation of latent functions from published domain estimates, growfunctions allows estimation of collections of functions for observation units (e.g., households, rather than aggregated domains, by accounting for an informative sampling design under which the probabilities for inclusion of observation units are related to the response variable. growfunctions includes plot
Energy Technology Data Exchange (ETDEWEB)
Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)
1990-11-01
A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.
ALTERNATIVE METHODS FOR CONDUCTING COMPARATIVE ANALYSES OF CADASTRAL SYSTEMS
Institute of Scientific and Technical Information of China (English)
无
2002-01-01
The conception of an efficient cadastral system is an important element in the development of each coun-try. It is crucial for the efficient operation of the real estate market-the security and liberty of making transactions, register-ing a property, planning operations, the introduction of an ad valorem tax on property and more rational use of space. InEurope there are different types of cadastral systems, because the countries in Europe have different cultural back-grounds, different economical and social backgrounds. Through the centuries, many types of cadastral systems evolvedand their differences often depend upon local cultural heritage, physical geography, land use, technology, etc. Compara-tive analyses of cadastral systems have been the subjects of many publications and studies in world literature. It was as-sessed that the useful tools in conducting comparative analyses of various cadastral systems include the procedures of statisti-cal inference. This paper presents the results of a project to compare the performance of ten cadastral systems international-ly by creating appropriate integrated indicators of a cadastral system using statistical technique. Such indicators willmake it possible to compare different cadastral systems and present them hierarchically in relation to their quality, struc-ture, as well as legal, organizational and technological solutions. From a good number of methods available, techniquesoriginating from two spheres of statistic inference were selected: distribution free methods and multivariate analysis meth-ods. For analyses with the distribution free methods, FRIEDMAN's test (FRIENDMAN's non-parametric variance analy-sis) as well as KENDALL's test (KENDALL's compatibility ratio) were selected. For analyses with the multivariate analy-sis methods, factor analysis was selected.
Statistical methods for analysing complex genetic traits
El Galta, Rachid
2006-01-01
Complex traits are caused by multiple genetic and environmental factors, and are therefore difficult to study compared with simple Mendelian diseases. The modes of inheritance of Mendelian diseases are often known. Methods to dissect such diseases are well described in literature. For complex geneti
On the statistical analysis of trend in tropospheric ozone levels
Vaquera-Huerta, Humberto
1997-11-01
This paper is a study of methodology to investigate trends in ozone levels in urban areas. Three methods are studied; two parametric (NHPP, and GPD) and one nonparametric. The nonhomogeneous Poisson process (NHPP) approach: This method is based on the idea that the number of exceedances over a high threshold follows a Poisson distribution. In this method the detection of trend is approached by estimating the intensity function of the process. The intensity function is estimated using parametric and nonparametric methods. A general parametric function over time for the rate of a NHPP is proposed in order to test for nonexponential patterns. The Generalized Pareto Distribution approach (GPD): In this method the detection of trend in ozone is approached by considering that the magnitude of the exceedances over a high threshold follows a generalized Pareto distribution. The nonparametric statistical approach: A test for trend and a nonparametric estimator of a trend parameter were studied. The asymptotic distribution of the estimator is also provided. The nonparametric estimator is compared with the least squares estimator. The three methods are studied empirically using a Monte Carlo method. Some insight into how well these methods perform is obtained. Also, the use of each method is illustrated by examples with ozone data. The results of this study show that NHPP performs very well as a method to detect ozone trends. The GPD and the nonparametric approaches had low power for detecting trends in simulation experiments.
CRC standard probability and statistics tables and formulae Student ed.
Kokoska, Stephen
2000-01-01
Users of statistics in their professional lives and statistics students will welcome this concise, easy-to-use reference for basic statistics and probability. It contains all of the standardized statistical tables and formulas typically needed plus material on basic statistics topics, such as probability theory and distributions, regression, analysis of variance, nonparametric statistics, and statistical quality control.For each type of distribution the authors supply:?definitions?tables?relationships with other distributions, including limiting forms?statistical parameters, such as variance a
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
Analyzing multiple spike trains with nonparametric Granger causality.
Nedungadi, Aatira G; Rangarajan, Govindan; Jain, Neeraj; Ding, Mingzhou
2009-08-01
Simultaneous recordings of spike trains from multiple single neurons are becoming commonplace. Understanding the interaction patterns among these spike trains remains a key research area. A question of interest is the evaluation of information flow between neurons through the analysis of whether one spike train exerts causal influence on another. For continuous-valued time series data, Granger causality has proven an effective method for this purpose. However, the basis for Granger causality estimation is autoregressive data modeling, which is not directly applicable to spike trains. Various filtering options distort the properties of spike trains as point processes. Here we propose a new nonparametric approach to estimate Granger causality directly from the Fourier transforms of spike train data. We validate the method on synthetic spike trains generated by model networks of neurons with known connectivity patterns and then apply it to neurons simultaneously recorded from the thalamus and the primary somatosensory cortex of a squirrel monkey undergoing tactile stimulation.
Prior processes and their applications nonparametric Bayesian estimation
Phadia, Eswar G
2016-01-01
This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Nonparametric forecasting of low-dimensional dynamical systems.
Berry, Tyrus; Giannakis, Dimitrios; Harlim, John
2015-03-01
This paper presents a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. The key idea is to represent the discrete shift maps on a smooth basis which can be obtained by the diffusion maps algorithm. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution to the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to quantify uncertainties (in fact, evolve the probability distribution) for nontrivial dynamical systems with equation-free modeling. We verify our approach on various examples, ranging from an inhomogeneous anisotropic stochastic differential equation on a torus, the chaotic Lorenz three-dimensional model, and the Niño-3.4 data set which is used as a proxy of the El Niño Southern Oscillation.
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.
Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z
2017-03-01
A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees
Directory of Open Access Journals (Sweden)
Savic Vladimir
2010-01-01
Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.
Multi-Directional Non-Parametric Analysis of Agricultural Efficiency
DEFF Research Database (Denmark)
Balezentis, Tomas
This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Nonparametric reconstruction of the Om diagnostic to test LCDM
Escamilla-Rivera, Celia
2015-01-01
Cosmic acceleration is usually related with the unknown dark energy, which equation of state, w(z), is constrained and numerically confronted with independent astrophysical data. In order to make a diagnostic of w(z), the introduction of a null test of dark energy can be done using a diagnostic function of redshift, Om. In this work we present a nonparametric reconstruction of this diagnostic using the so-called Loess-Simex factory to test the concordance model with the advantage that this approach offers an alternative way to relax the use of priors and find a possible 'w' that reliably describe the data with no previous knowledge of a cosmological model. Our results demonstrate that the method applied to the dynamical Om diagnostic finds a preference for a dark energy model with equation of state w =-2/3, which correspond to a static domain wall network.
Equity and efficiency in private and public education: a nonparametric comparison
L. Cherchye; K. de Witte; E. Ooghe; I. Nicaise
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W
2009-02-01
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Non-parametric tests of productive efficiency with errors-in-variables
Kuosmanen, T.K.; Post, T.; Scholtes, S.
2007-01-01
We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans
Equity and efficiency in private and public education: a nonparametric comparison
Cherchye, L.; de Witte, K.; Ooghe, E.; Nicaise, I.
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Semi-parametric regression: Efficiency gains from modeling the nonparametric part
Yu, Kyusang; Park, Byeong U; 10.3150/10-BEJ296
2011-01-01
It is widely admitted that structured nonparametric modeling that circumvents the curse of dimensionality is important in nonparametric estimation. In this paper we show that the same holds for semi-parametric estimation. We argue that estimation of the parametric component of a semi-parametric model can be improved essentially when more structure is put into the nonparametric part of the model. We illustrate this for the partially linear model, and investigate efficiency gains when the nonparametric part of the model has an additive structure. We present the semi-parametric Fisher information bound for estimating the parametric part of the partially linear additive model and provide semi-parametric efficient estimators for which we use a smooth backfitting technique to deal with the additive nonparametric part. We also present the finite sample performances of the proposed estimators and analyze Boston housing data as an illustration.
Directory of Open Access Journals (Sweden)
Archer Kellie J
2008-02-01
Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been
Kong, Xiangrong; Mas, Valeria; Archer, Kellie J
2008-02-26
With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the
Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.
Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:
Vale, A
2007-01-01
We revisit the longstanding question of whether first brightest cluster galaxies are statistically drawn from the same distribution as other cluster galaxies or are "special", using the new non-parametric, empirically based model presented in Vale&Ostriker (2006) for associating galaxy luminosity with halo/subhalo masses. We introduce scatter in galaxy luminosity at fixed halo mass into this model, building a conditional luminosity function (CLF) by considering two possible models: a simple lognormal and a model based on the distribution of concentration in haloes of a given mass. We show that this model naturally allows an identification of halo/subhalo systems with groups and clusters of galaxies, giving rise to a clear central/satellite galaxy distinction. We then use these results to build up the dependence of brightest cluster galaxy (BCG) magnitudes on cluster luminosity, focusing on two statistical indicators, the dispersion in BCG magnitude and the magnitude difference between first and second bri...
Van Steenbergen, N.; Willems, P.
2012-04-01
Reliable flood forecasts are the most important non-structural measures to reduce the impact of floods. However flood forecasting systems are subject to uncertainty originating from the input data, model structure and model parameters of the different hydraulic and hydrological submodels. To quantify this uncertainty a non-parametric data-based approach has been developed. This approach analyses the historical forecast residuals (differences between the predictions and the observations at river gauging stations) without using a predefined statistical error distribution. Because the residuals are correlated with the value of the forecasted water level and the lead time, the residuals are split up into discrete classes of simulated water levels and lead times. For each class, percentile values are calculated of the model residuals and stored in a 'three dimensional error' matrix. By 3D interpolation in this error matrix, the uncertainty in new forecasted water levels can be quantified. In addition to the quantification of the uncertainty, the communication of this uncertainty is equally important. The communication has to be done in a consistent way, reducing the chance of misinterpretation. Also, the communication needs to be adapted to the audience; the majority of the larger public is not interested in in-depth information on the uncertainty on the predicted water levels, but only is interested in information on the likelihood of exceedance of certain alarm levels. Water managers need more information, e.g. time dependent uncertainty information, because they rely on this information to undertake the appropriate flood mitigation action. There are various ways in presenting uncertainty information (numerical, linguistic, graphical, time (in)dependent, etc.) each with their advantages and disadvantages for a specific audience. A useful method to communicate uncertainty of flood forecasts is by probabilistic flood mapping. These maps give a representation of the
DEFF Research Database (Denmark)
Tan, Qihua; Zhao, J H; Iachine, I
2004-01-01
This report investigates the power issue in applying the non-parametric linkage analysis of affected sib-pairs (ASP) [Kruglyak and Lander, 1995: Am J Hum Genet 57:439-454] to localize genes that contribute to human longevity using long-lived sib-pairs. Data were simulated by introducing a recently...... developed statistical model for measuring marker-longevity associations [Yashin et al., 1999: Am J Hum Genet 65:1178-1193], enabling direct power comparison between linkage and association approaches. The non-parametric linkage (NPL) scores estimated in the region harboring the causal allele are evaluated...... in case of a dominant effect. Although the power issue may depend heavily on the true genetic nature in maintaining survival, our study suggests that results from small-scale sib-pair investigations should be referred with caution, given the complexity of human longevity....
... Certification Import Safety International Recall Guidance Civil and Criminal Penalties Federal Court Orders & Decisions Research & Statistics Research & Statistics Technical Reports Injury Statistics NEISS Injury ...
Cosmic Statistics of Statistics
Szapudi, I.; Colombi, S.; Bernardeau, F.
1999-01-01
The errors on statistics measured in finite galaxy catalogs are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi (1996) is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly nonlinear to weakly nonlinear scales. The final analytic formu...
A Unified Approach to Constructing Nonparametric Rank Tests.
1986-07-06
82177 - , % 2- W -T- . ,o.. .o I" Va - , . - ,.-- tic, the Kolmogorov-Smirnov statistic, and the Wald - Wolfowitz statistic for the two-sample...linear, such as the Kolmogorov-Smirnov test, the Wald - Wolfowitz (1940) "runs test", the so-called "tests based on exceeding observations" (H;jek 32...metric U induces the Kolmogorov-Smirnov test statistic, for equal sample sizes. The following result shows that the Wald - Wolfowitz test statistic is also
Directory of Open Access Journals (Sweden)
SANGCHAN KANTABUTRA
2009-04-01
Full Text Available This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA. Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have access to and practice different production technologies than rural schools, and rural institutions appear to operate less efficiently than their urban counterparts. In addition, a sensitivity analysis, conducted to ascertain the robustness of the analytical framework, revealed the stability of urban-rural effects on school efficiency. Policy to improve school eff iciency should thus take varying geographical area differences into account, viewing rural and urban schools as different from one another. Moreover, policymakers might consider shifting existing resources from urban schools to rural schools, provided that the increase in overall rural efficiency would be greater than the decrease, if any, in the city. Future research directions are discussed.
Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker
2012-08-01
Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Directory of Open Access Journals (Sweden)
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
The Utility of Nonparametric Transformations for Imputation of Survey Data
Directory of Open Access Journals (Sweden)
Robbins Michael W.
2014-12-01
Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.
Nonparametric identification of structural modifications in Laplace domain
Suwała, G.; Jankowski, Ł.
2017-02-01
This paper proposes and experimentally verifies a Laplace-domain method for identification of structural modifications, which (1) unlike time-domain formulations, allows the identification to be focused on these parts of the frequency spectrum that have a high signal-to-noise ratio, and (2) unlike frequency-domain formulations, decreases the influence of numerical artifacts related to the particular choice of the FFT exponential window decay. In comparison to the time-domain approach proposed earlier, advantages of the proposed method are smaller computational cost and higher accuracy, which leads to reliable performance in more difficult identification cases. Analytical formulas for the first- and second-order sensitivity analysis are derived. The approach is based on a reduced nonparametric model, which has the form of a set of selected structural impulse responses. Such a model can be collected purely experimentally, which obviates the need for design and laborious updating of a parametric model, such as a finite element model. The approach is verified experimentally using a 26-node lab 3D truss structure and 30 identification cases of a single mass modification or two concurrent mass modifications.
A New Non-Parametric Approach to Galaxy Morphological Classification
Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero
2003-01-01
We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...
Adaptive Neural Network Nonparametric Identifier With Normalized Learning Laws.
Chairez, Isaac
2016-04-05
This paper addresses the design of a normalized convergent learning law for neural networks (NNs) with continuous dynamics. The NN is used here to obtain a nonparametric model for uncertain systems described by a set of ordinary differential equations. The source of uncertainties is the presence of some external perturbations and poor knowledge of the nonlinear function describing the system dynamics. A new adaptive algorithm based on normalized algorithms was used to adjust the weights of the NN. The adaptive algorithm was derived by means of a nonstandard logarithmic Lyapunov function (LLF). Two identifiers were designed using two variations of LLFs leading to a normalized learning law for the first identifier and a variable gain normalized learning law. In the case of the second identifier, the inclusion of normalized learning laws yields to reduce the size of the convergence region obtained as solution of the practical stability analysis. On the other hand, the velocity of convergence for the learning laws depends on the norm of errors in inverse form. This fact avoids the peaking transient behavior in the time evolution of weights that accelerates the convergence of identification error. A numerical example demonstrates the improvements achieved by the algorithm introduced in this paper compared with classical schemes with no-normalized continuous learning methods. A comparison of the identification performance achieved by the no-normalized identifier and the ones developed in this paper shows the benefits of the learning law proposed in this paper.
Nonparametric estimation of quantum states, processes and measurements
Lougovski, Pavel; Bennink, Ryan
Quantum state, process, and measurement estimation methods traditionally use parametric models, in which the number and role of relevant parameters is assumed to be known. When such an assumption cannot be justified, a common approach in many disciplines is to fit the experimental data to multiple models with different sets of parameters and utilize an information criterion to select the best fitting model. However, it is not always possible to assume a model with a finite (countable) number of parameters. This typically happens when there are unobserved variables that stem from hidden correlations that can only be unveiled after collecting experimental data. How does one perform quantum characterization in this situation? We present a novel nonparametric method of experimental quantum system characterization based on the Dirichlet Process (DP) that addresses this problem. Using DP as a prior in conjunction with Bayesian estimation methods allows us to increase model complexity (number of parameters) adaptively as the number of experimental observations grows. We illustrate our approach for the one-qubit case and show how a probability density function for an unknown quantum process can be estimated.
Non-parametric and least squares Langley plot methods
Directory of Open Access Journals (Sweden)
P. W. Kiedron
2015-04-01
Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.
Pivotal Estimation of Nonparametric Functions via Square-root Lasso
Belloni, Alexandre; Wang, Lie
2011-01-01
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter $\\sigma$ of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, $\\ell_1$-rate of converge, $\\ell_\\infty$-rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild con...
Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods
Mosaedi, Abolfazl; Kouhestani, Nasrin
2010-05-01
One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.
Nonparametric Comparison of Two Dynamic Parameter Setting Methods in a Meta-Heuristic Approach
Directory of Open Access Journals (Sweden)
Seyhun HEPDOGAN
2007-10-01
Full Text Available Meta-heuristics are commonly used to solve combinatorial problems in practice. Many approaches provide very good quality solutions in a short amount of computational time; however most meta-heuristics use parameters to tune the performance of the meta-heuristic for particular problems and the selection of these parameters before solving the problem can require much time. This paper investigates the problem of setting parameters using a typical meta-heuristic called Meta-RaPS (Metaheuristic for Randomized Priority Search.. Meta-RaPS is a promising meta-heuristic optimization method that has been applied to different types of combinatorial optimization problems and achieved very good performance compared to other meta-heuristic techniques. To solve a combinatorial problem, Meta-RaPS uses two well-defined stages at each iteration: construction and local search. After a number of iterations, the best solution is reported. Meta-RaPS performance depends on the fine tuning of two main parameters, priority percentage and restriction percentage, which are used during the construction stage. This paper presents two different dynamic parameter setting methods for Meta-RaPS. These dynamic parameter setting approaches tune the parameters while a solution is being found. To compare these two approaches, nonparametric statistic approaches are utilized since the solutions are not normally distributed. Results from both these dynamic parameter setting methods are reported.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
Directory of Open Access Journals (Sweden)
Silvia Rizzi
2016-05-01
Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Comparison of three nonparametric kriging methods for delineating heavy-metal contaminated soils
Energy Technology Data Exchange (ETDEWEB)
Juang, K.W.; Lee, D.Y
2000-02-01
The probability of pollutant concentrations greater than a cutoff value is useful for delineating hazardous areas in contaminated soils. It is essential for risk assessment and reclamation. In this study, three nonparametric kriging methods [indicator kriging, probability kriging, and kriging with the cumulative distribution function (CDF) of order statistics (CDF kriging)] were used to estimate the probability of heavy-metal concentrations lower than a cutoff value. In terms of methodology, the probability kriging estimator and CDF kriging estimator take into account the information of the order relation, which is not considered in indicator kriging. Since probability kriging has been shown to be better than indicator kriging for delineating contaminated soils, the performance of CDF kriging, which the authors propose, was compared with that of probability kriging in this study. A data set of soil Cd and Pb concentrations obtained from a 10-ha heavy-metal contaminated site in Taoyuan, Taiwan, was used. The results demonstrated that the probability kriging and CDF kriging estimations were more accurate than the indicator kriging estimation. On the other hand, because the probability kriging was based on the cokriging estimator, some unreliable estimates occurred in the probability kriging estimation. This indicated that probability kriging was not as robust as CDF kriging. Therefore, CDF kriging is more suitable than probability kriging for estimating the probability of heavy-metal concentrations lower than a cutoff value.
Nonparametric signal processing validation in T-wave alternans detection and estimation.
Goya-Esteban, R; Barquero-Pérez, O; Blanco-Velasco, M; Caamaño-Fernández, A J; García-Alberola, A; Rojo-Álvarez, J L
2014-04-01
Although a number of methods have been proposed for T-Wave Alternans (TWA) detection and estimation, their performance strongly depends on their signal processing stages and on their free parameters tuning. The dependence of the system quality with respect to the main signal processing stages in TWA algorithms has not yet been studied. This study seeks to optimize the final performance of the system by successive comparisons of pairs of TWA analysis systems, with one single processing difference between them. For this purpose, a set of decision statistics are proposed to evaluate the performance, and a nonparametric hypothesis test (from Bootstrap resampling) is used to make systematic decisions. Both the temporal method (TM) and the spectral method (SM) are analyzed in this study. The experiments were carried out in two datasets: first, in semisynthetic signals with artificial alternant waves and added noise; second, in two public Holter databases with different documented risk of sudden cardiac death. For semisynthetic signals (SNR = 15 dB), after the optimization procedure, a reduction of 34.0% (TM) and 5.2% (SM) of the power of TWA amplitude estimation errors was achieved, and the power of error probability was reduced by 74.7% (SM). For Holter databases, appropriate tuning of several processing blocks, led to a larger intergroup separation between the two populations for TWA amplitude estimation. Our proposal can be used as a systematic procedure for signal processing block optimization in TWA algorithmic implementations.
Nonparametric Estimates of Gene × Environment Interaction Using Local Structural Equation Modeling
Briley, Daniel A.; Harden, K. Paige; Bates, Timothy C.; Tucker-Drob, Elliot M.
2017-01-01
Gene × Environment (G×E) interaction studies test the hypothesis that the strength of genetic influence varies across environmental contexts. Existing latent variable methods for estimating G×E interactions in twin and family data specify parametric (typically linear) functions for the interaction effect. An improper functional form may obscure the underlying shape of the interaction effect and may lead to failures to detect a significant interaction. In this article, we introduce a novel approach to the behavior genetic toolkit, local structural equation modeling (LOSEM). LOSEM is a highly flexible nonparametric approach for estimating latent interaction effects across the range of a measured moderator. This approach opens up the ability to detect and visualize new forms of G×E interaction. We illustrate the approach by using LOSEM to estimate gene × socioeconomic status (SES) interactions for six cognitive phenotypes. Rather than continuously and monotonically varying effects as has been assumed in conventional parametric approaches, LOSEM indicated substantial nonlinear shifts in genetic variance for several phenotypes. The operating characteristics of LOSEM were interrogated through simulation studies where the functional form of the interaction effect was known. LOSEM provides a conservative estimate of G×E interaction with sufficient power to detect statistically significant G×E signal with moderate sample size. We offer recommendations for the application of LOSEM and provide scripts for implementing these biometric models in Mplus and in OpenMx under R. PMID:26318287
Non-Parametric Evolutionary Algorithm for Estimating Root Zone Soil Moisture
Mohanty, B.; Shin, Y.; Ines, A. M.
2013-12-01
Prediction of root zone soil moisture is critical for water resources management. In this study, we explored a non-parametric evolutionary algorithm for estimating root zone soil moisture from a time series of spatially-distributed rainfall across multiple weather locations under two different hydro-climatic regions. A new genetic algorithm-based hidden Markov model (HMMGA) was developed to estimate long-term root zone soil moisture dynamics at different soil depths. Also, we analyzed rainfall occurrence probabilities and dry/wet spell lengths reproduced by this approach. The HMMGA was used to estimate the optimal state sequences (weather states) based on the precipitation history. Historical root zone soil moisture statistics were then determined based on the weather state conditions. To test the new approach, we selected two different soil moisture fields, Oklahoma (130 km x 130 km) and Illinois (300 km x 500 km), during 1995 to 2009 and 1994 to 2010, respectively. We found that the newly developed framework performed well in predicting root zone soil moisture dynamics at both the spatial scales. Also, the reproduced rainfall occurrence probabilities and dry/wet spell lengths matched well with the observations at the spatio-temporal scales. Since the proposed algorithm requires only precipitation and historical soil moisture data from existing, established weather stations, it can serve an attractive alternative for predicting root zone soil moisture in the future using climate change scenarios and root zone soil moisture history.
A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping
Li, Yan-Rong; Wang, Jian-Min; Bai, Jin-Ming
2016-11-01
Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region, BLR) that is composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of continuum variations (i.e., reverberation mapping, RM) and directly reflect the structures and kinematic information of BLRs through the so-called transfer function (also known as the velocity-delay map). Based on the previous works of Rybicki and Press and Zu et al., we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively displaced Gaussian response functions. Therefore, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by observation data. We formulate our approach in a previously well-established framework that incorporates the statistical modeling of continuum variations as a damped random walk process and takes into account long-term secular variations which are irrelevant to RM signals. The application to RM data shows the fidelity of our approach.
Directory of Open Access Journals (Sweden)
Campbell Michael J
2004-12-01
Full Text Available Abstract Health-Related Quality of Life (HRQoL measures are becoming increasingly used in clinical trials as primary outcome measures. Investigators are now asking statisticians for advice on how to analyse studies that have used HRQoL outcomes. HRQoL outcomes, like the SF-36, are usually measured on an ordinal scale. However, most investigators assume that there exists an underlying continuous latent variable that measures HRQoL, and that the actual measured outcomes (the ordered categories, reflect contiguous intervals along this continuum. The ordinal scaling of HRQoL measures means they tend to generate data that have discrete, bounded and skewed distributions. Thus, standard methods of analysis such as the t-test and linear regression that assume Normality and constant variance may not be appropriate. For this reason, conventional statistical advice would suggest that non-parametric methods be used to analyse HRQoL data. The bootstrap is one such computer intensive non-parametric method for analysing data. We used the bootstrap for hypothesis testing and the estimation of standard errors and confidence intervals for parameters, in four datasets (which illustrate the different aspects of study design. We then compared and contrasted the bootstrap with standard methods of analysing HRQoL outcomes. The standard methods included t-tests, linear regression, summary measures and General Linear Models. Overall, in the datasets we studied, using the SF-36 outcome, bootstrap methods produce results similar to conventional statistical methods. This is likely because the t-test and linear regression are robust to the violations of assumptions that HRQoL data are likely to cause (i.e. non-Normality. While particular to our datasets, these findings are likely to generalise to other HRQoL outcomes, which have discrete, bounded and skewed distributions. Future research with other HRQoL outcome measures, interventions and populations, is required to
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.
Schlenker, Evelyn
2016-01-01
This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
Ramajo, Julián; Cordero, José Manuel; Márquez, Miguel Ángel
2017-10-01
This paper analyses region-level technical efficiency in nine European countries over the 1995-2007 period. We propose the application of a nonparametric conditional frontier approach to account for the presence of heterogeneous conditions in the form of geographical externalities. Such environmental factors are beyond the control of regional authorities, but may affect the production function. Therefore, they need to be considered in the frontier estimation. Specifically, a spatial autoregressive term is included as an external conditioning factor in a robust order- m model. Thus we can test the hypothesis of non-separability (the external factor impacts both the input-output space and the distribution of efficiencies), demonstrating the existence of significant global interregional spillovers into the production process. Our findings show that geographical externalities affect both the frontier level and the probability of being more or less efficient. Specifically, the results support the fact that the spatial lag variable has an inverted U-shaped non-linear impact on the performance of regions. This finding can be interpreted as a differential effect of interregional spillovers depending on the size of the neighboring economies: positive externalities for small values, possibly related to agglomeration economies, and negative externalities for high values, indicating the possibility of production congestion. Additionally, evidence of the existence of a strong geographic pattern of European regional efficiency is reported and the levels of technical efficiency are acknowledged to have converged during the period under analysis.
Institute of Scientific and Technical Information of China (English)
LINGNeng-xiang; DUXue-qiao
2005-01-01
In this paper, we study the strong consistency for partitioning estimation of regression function under samples that axe φ-mixing sequences with identically distribution.Key words: nonparametric regression function; partitioning estimation; strong convergence;φ-mixing sequences.
Kernel bandwidth estimation for non-parametric density estimation: a comparative study
CSIR Research Space (South Africa)
Van der Walt, CM
2013-12-01
Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
Nonparametric Monitoring for Geotechnical Structures Subject to Long-Term Environmental Change
Directory of Open Access Journals (Sweden)
Hae-Bum Yun
2011-01-01
Full Text Available A nonparametric, data-driven methodology of monitoring for geotechnical structures subject to long-term environmental change is discussed. Avoiding physical assumptions or excessive simplification of the monitored structures, the nonparametric monitoring methodology presented in this paper provides reliable performance-related information particularly when the collection of sensor data is limited. For the validation of the nonparametric methodology, a field case study was performed using a full-scale retaining wall, which had been monitored for three years using three tilt gauges. Using the very limited sensor data, it is demonstrated that important performance-related information, such as drainage performance and sensor damage, could be disentangled from significant daily, seasonal and multiyear environmental variations. Extensive literature review on recent developments of parametric and nonparametric data processing techniques for geotechnical applications is also presented.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
Fan, Jianqing; Song, Rui
2011-01-01
A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...
Liu, Yingchun; Sun, Guoxiang; Wang, Yan; Yang, Lanping; Yang, Fangliang
2015-06-01
Micellar electrokinetic chromatography fingerprinting combined with quantification was successfully developed and applied to monitor the quality consistency of Weibizhi tablets, which is a classical compound preparation used to treat gastric ulcers. A background electrolyte composed of 57 mmol/L sodium borate, 21 mmol/L sodium dodecylsulfate and 100 mmol/L sodium hydroxide was used to separate compounds. To optimize capillary electrophoresis conditions, multivariate statistical analyses were applied. First, the most important factors influencing sample electrophoretic behavior were identified as background electrolyte concentrations. Then, a Box-Benhnken design response surface strategy using resolution index RF as an integrated response was set up to correlate factors with response. RF reflects the effective signal amount, resolution, and signal homogenization in an electropherogram, thus, it was regarded as an excellent indicator. In fingerprint assessments, simple quantified ratio fingerprint method was established for comprehensive quality discrimination of traditional Chinese medicines/herbal medicines from qualitative and quantitative perspectives, by which the quality of 27 samples from the same manufacturer were well differentiated. In addition, the fingerprint-efficacy relationship between fingerprints and antioxidant activities was established using partial least squares regression, which provided important medicinal efficacy information for quality control. The present study offered an efficient means for monitoring Weibizhi tablet quality consistency.
A non-parametric peak calling algorithm for DamID-Seq.
Directory of Open Access Journals (Sweden)
Renhua Li
Full Text Available Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS of double sex (DSX-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq. One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only. After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1 reads resampling; 2 reads scaling (normalization and computing signal-to-noise fold changes; 3 filtering; 4 Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC. We also used irreproducible discovery rate (IDR analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.