WorldWideScience

Sample records for nonparametric statistical methods

  1. Nonparametric statistical methods

    CERN Document Server

    Hollander, Myles; Chicken, Eric

    2013-01-01

    Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics  Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit

  2. Nonparametric statistical methods using R

    CERN Document Server

    Kloke, John

    2014-01-01

    A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c

  3. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2010-01-01

    Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente

  4. CURRENT STATUS OF NONPARAMETRIC STATISTICS

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2015-02-01

    Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems

  5. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2014-01-01

    Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.

  6. Nonparametric statistics for social and behavioral sciences

    CERN Document Server

    Kraska-MIller, M

    2013-01-01

    Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency

  7. Statistic Non-Parametric Methods of Measurement and Interpretation of Existing Statistic Connections within Seaside Hydro Tourism

    OpenAIRE

    MIRELA SECARĂ

    2008-01-01

    Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...

  8. Statistical analysis using the Bayesian nonparametric method for irradiation embrittlement of reactor pressure vessels

    Science.gov (United States)

    Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka

    2016-10-01

    In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased.

  9. A nonparametric statistical method for image segmentation using information theory and curve evolution.

    Science.gov (United States)

    Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S

    2005-10-01

    In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.

  10. Astronomical Methods for Nonparametric Regression

    Science.gov (United States)

    Steinhardt, Charles L.; Jermyn, Adam

    2017-01-01

    I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.

  11. Recent Advances and Trends in Nonparametric Statistics

    CERN Document Server

    Akritas, MG

    2003-01-01

    The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o

  12. Introduction to nonparametric statistics for the biological sciences using R

    CERN Document Server

    MacFarland, Thomas W

    2016-01-01

    This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...

  13. 2nd Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Manteiga, Wenceslao; Romo, Juan

    2016-01-01

    This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...

  14. Characterizing Ipomopsis rubra (Polemoniaceae) germination under various thermal scenarios with non-parametric and semi-parametric statistical methods.

    Science.gov (United States)

    Pérez, Hector E; Kettner, Keith

    2013-10-01

    Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.

  15. Nonparametric statistical tests for the continuous data: the basic concept and the practical use.

    Science.gov (United States)

    Nahm, Francis Sahngun

    2016-02-01

    Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.

  16. Why preferring parametric forecasting to nonparametric methods?

    Science.gov (United States)

    Jabot, Franck

    2015-05-07

    A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Methodology in robust and nonparametric statistics

    CERN Document Server

    Jurecková, Jana; Picek, Jan

    2012-01-01

    Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators

  18. Nonparametric statistics a step-by-step approach

    CERN Document Server

    Corder, Gregory W

    2014-01-01

    "…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory.  It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca

  19. portfolio optimization based on nonparametric estimation methods

    Directory of Open Access Journals (Sweden)

    mahsa ghandehari

    2017-03-01

    Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.

  20. Using Mathematica to build Non-parametric Statistical Tables

    Directory of Open Access Journals (Sweden)

    Gloria Perez Sainz de Rozas

    2003-01-01

    Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.

  1. Categorical and nonparametric data analysis choosing the best statistical technique

    CERN Document Server

    Nussbaum, E Michael

    2014-01-01

    Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain

  2. Nonparametric statistical structuring of knowledge systems using binary feature matches

    DEFF Research Database (Denmark)

    Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue

    2014-01-01

    statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...

  3. 1st Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Lahiri, S; Politis, Dimitris

    2014-01-01

    This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world.   The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...

  4. A Bayesian nonparametric method for prediction in EST analysis

    Directory of Open Access Journals (Sweden)

    Prünster Igor

    2007-09-01

    Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.

  5. Statistical Methods for Astronomy

    CERN Document Server

    Feigelson, Eric D

    2012-01-01

    This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spati...

  6. Nonparametric methods in actigraphy: An update

    Directory of Open Access Journals (Sweden)

    Bruno S.B. Gonçalves

    2014-09-01

    Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.

  7. Nonparametric methods in actigraphy: An update

    Science.gov (United States)

    Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.

    2014-01-01

    Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921

  8. Biological parametric mapping with robust and non-parametric statistics.

    Science.gov (United States)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A

    2011-07-15

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Yield Stability of Maize Hybrids Evaluated in Maize Regional Trials in Southwestern China Using Nonparametric Methods

    Institute of Scientific and Technical Information of China (English)

    LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi

    2010-01-01

    Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.

  10. Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja

    CERN Document Server

    Taskinen, Sara

    2015-01-01

    Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.

  11. Statistical methods

    CERN Document Server

    Szulc, Stefan

    1965-01-01

    Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then

  12. Non-parametric Estimation approach in statistical investigation of nuclear spectra

    CERN Document Server

    Jafarizadeh, M A; Sabri, H; Maleki, B Rashidian

    2011-01-01

    In this paper, Kernel Density Estimation (KDE) as a non-parametric estimation method is used to investigate statistical properties of nuclear spectra. The deviation to regular or chaotic dynamics, is exhibited by closer distances to Poisson or Wigner limits respectively which evaluated by Kullback-Leibler Divergence (KLD) measure. Spectral statistics of different sequences prepared by nuclei corresponds to three dynamical symmetry limits of Interaction Boson Model(IBM), oblate and prolate nuclei and also the pairing effect on nuclear level statistics are analyzed (with pure experimental data). KD-based estimated density function, confirm previous predictions with minimum uncertainty (evaluated with Integrate Absolute Error (IAE)) in compare to Maximum Likelihood (ML)-based method. Also, the increasing of regularity degrees of spectra due to pairing effect is reveal.

  13. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    Science.gov (United States)

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  14. Comparing parametric and nonparametric regression methods for panel data

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...

  15. Examples of the Application of Nonparametric Information Geometry to Statistical Physics

    Directory of Open Access Journals (Sweden)

    Giovanni Pistone

    2013-09-01

    Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.

  16. Statistical methods

    CERN Document Server

    Freund, Rudolf J; Wilson, William J

    2010-01-01

    Statistical Methods, 3e provides students with a working introduction to statistical methods offering a wide range of applications that emphasize the quantitative skills useful across many academic disciplines. This text takes a classic approach emphasizing concepts and techniques for working out problems and intepreting results. The book includes research projects, real-world case studies, numerous examples and data exercises organized by level of difficulty. This text requires that a student be familiar with algebra. New to this edition: NEW expansion of exercises a

  17. Non-Parametric Statistical Methods and Data Transformations in Agricultural Pest Population Studies Métodos Estadísticos no Paramétricos y Transformaciones de Datos en Estudios de Poblaciones de Plagas Agrícolas

    Directory of Open Access Journals (Sweden)

    Alcides Cabrera Campos

    2012-09-01

    Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la

  18. Comparing parametric and nonparametric regression methods for panel data

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...

  19. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...

  20. A robust nonparametric method for quantifying undetected extinctions.

    Science.gov (United States)

    Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E

    2016-06-01

    How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.

  1. THE GROWTH POINTS OF STATISTICAL METHODS

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2014-11-01

    Full Text Available On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data

  2. Non-parametric versus parametric methods in environmental sciences

    Directory of Open Access Journals (Sweden)

    Muhammad Riaz

    2016-01-01

    Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.

  3. International Conference on Robust Rank-Based and Nonparametric Methods

    CERN Document Server

    McKean, Joseph

    2016-01-01

    The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...

  4. Statistical methods for ranking data

    CERN Document Server

    Alvo, Mayer

    2014-01-01

    This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.

  5. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    -Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  6. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...

  7. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  8. Digital spectral analysis parametric, non-parametric and advanced methods

    CERN Document Server

    Castanié, Francis

    2013-01-01

    Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a

  9. COLOR IMAGE RETRIEVAL BASED ON NON-PARAMETRIC STATISTICAL TESTS OF HYPOTHESIS

    Directory of Open Access Journals (Sweden)

    R. Shekhar

    2016-09-01

    Full Text Available A novel method for color image retrieval, based on statistical non-parametric tests such as twosample Wald Test for equality of variance and Man-Whitney U test, is proposed in this paper. The proposed method tests the deviation, i.e. distance in terms of variance between the query and target images; if the images pass the test, then it is proceeded to test the spectrum of energy, i.e. distance between the mean values of the two images; otherwise, the test is dropped. If the query and target images pass the tests then it is inferred that the two images belong to the same class, i.e. both the images are same; otherwise, it is assumed that the images belong to different classes, i.e. both images are different. The proposed method is robust for scaling and rotation, since it adjusts itself and treats either the query image or the target image is the sample of other.

  10. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...

  11. Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods

    Directory of Open Access Journals (Sweden)

    Pedro Carvalho

    2016-03-01

    Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.

  12. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method.

    Science.gov (United States)

    López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.

  13. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)

  14. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  15. Applications of non-parametric statistics and analysis of variance on sample variances

    Science.gov (United States)

    Myers, R. H.

    1981-01-01

    Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.

  16. Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat

    Directory of Open Access Journals (Sweden)

    Luca Scrucca

    2001-06-01

    Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.

  17. t-tests, non-parametric tests, and large studies—a paradox of statistical practice?

    Directory of Open Access Journals (Sweden)

    Fagerland Morten W

    2012-06-01

    Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.

  18. Out-of-Sample Extensions for Non-Parametric Kernel Methods.

    Science.gov (United States)

    Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang

    2017-02-01

    Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.

  19. Non-parametric and least squares Langley plot methods

    Directory of Open Access Journals (Sweden)

    P. W. Kiedron

    2015-04-01

    Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.

  20. Non-parametric change-point method for differential gene expression detection.

    Directory of Open Access Journals (Sweden)

    Yao Wang

    Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

  1. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda

    2016-01-01

    Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...

  2. The application of non-parametric statistical techniques to an ALARA programme.

    Science.gov (United States)

    Moon, J H; Cho, Y H; Kang, C S

    2001-01-01

    For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.

  3. Patterns of trunk muscle activation during walking and pole walking using statistical non-parametric mapping.

    Science.gov (United States)

    Zoffoli, Luca; Ditroilo, Massimiliano; Federici, Ario; Lucertini, Francesco

    2017-09-09

    This study used surface electromyography (EMG) to investigate the regions and patterns of activity of the external oblique (EO), erector spinae longissimus (ES), multifidus (MU) and rectus abdominis (RA) muscles during walking (W) and pole walking (PW) performed at different speeds and grades. Eighteen healthy adults undertook W and PW on a motorized treadmill at 60% and 100% of their walk-to-run preferred transition speed at 0% and 7% treadmill grade. The Teager-Kaiser energy operator was employed to improve the muscle activity detection and statistical non-parametric mapping based on paired t-tests was used to highlight statistical differences in the EMG patterns corresponding to different trials. The activation amplitude of all trunk muscles increased at high speed, while no differences were recorded at 7% treadmill grade. ES and MU appeared to support the upper body at the heel-strike during both W and PW, with the latter resulting in elevated recruitment of EO and RA as required to control for the longer stride and the push of the pole. Accordingly, the greater activity of the abdominal muscles and the comparable intervention of the spine extensors supports the use of poles by walkers seeking higher engagement of the lower trunk region. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt;

    2013-01-01

    This paper presents a method for correction and alignment of global radiation observations based on information obtained from calculated global radiation, in the present study one-hour forecast of global radiation from a numerical weather prediction (NWP) model is used. Systematical errors detected...... in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...... University. The method can be useful for optimized use of solar radiation observations for forecasting, monitoring, and modeling of energy production and load which are affected by solar radiation....

  5. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    Science.gov (United States)

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  6. A non-parametric statistical test to compare clusters with applications in functional magnetic resonance imaging data.

    Science.gov (United States)

    Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R

    2014-12-10

    Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.

  7. Nonparametric methods for drought severity estimation at ungauged sites

    Science.gov (United States)

    Sadri, S.; Burn, D. H.

    2012-12-01

    The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.

  8. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    Directory of Open Access Journals (Sweden)

    Silvia Rizzi

    2016-05-01

    Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.

  9. The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length

    NARCIS (Netherlands)

    Tendeiro, Jorge N.; Meijer, Rob R.

    2013-01-01

    To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test

  10. Comparison of reliability techniques of parametric and non-parametric method

    Directory of Open Access Journals (Sweden)

    C. Kalaiselvan

    2016-06-01

    Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.

  11. Statistical methods in astronomy

    OpenAIRE

    Long, James P.; de Souza, Rafael S.

    2017-01-01

    We present a review of data types and statistical methods often encountered in astronomy. The aim is to provide an introduction to statistical applications in astronomy for statisticians and computer scientists. We highlight the complex, often hierarchical, nature of many astronomy inference problems and advocate for cross-disciplinary collaborations to address these challenges.

  12. Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods

    Science.gov (United States)

    Mosaedi, Abolfazl; Kouhestani, Nasrin

    2010-05-01

    One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.

  13. Nonparametric Comparison of Two Dynamic Parameter Setting Methods in a Meta-Heuristic Approach

    Directory of Open Access Journals (Sweden)

    Seyhun HEPDOGAN

    2007-10-01

    Full Text Available Meta-heuristics are commonly used to solve combinatorial problems in practice. Many approaches provide very good quality solutions in a short amount of computational time; however most meta-heuristics use parameters to tune the performance of the meta-heuristic for particular problems and the selection of these parameters before solving the problem can require much time. This paper investigates the problem of setting parameters using a typical meta-heuristic called Meta-RaPS (Metaheuristic for Randomized Priority Search.. Meta-RaPS is a promising meta-heuristic optimization method that has been applied to different types of combinatorial optimization problems and achieved very good performance compared to other meta-heuristic techniques. To solve a combinatorial problem, Meta-RaPS uses two well-defined stages at each iteration: construction and local search. After a number of iterations, the best solution is reported. Meta-RaPS performance depends on the fine tuning of two main parameters, priority percentage and restriction percentage, which are used during the construction stage. This paper presents two different dynamic parameter setting methods for Meta-RaPS. These dynamic parameter setting approaches tune the parameters while a solution is being found. To compare these two approaches, nonparametric statistic approaches are utilized since the solutions are not normally distributed. Results from both these dynamic parameter setting methods are reported.

  14. STATISTICAL METHODS IN HISTORY

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2016-01-01

    Full Text Available We have given a critical analysis of statistical models and methods for processing text information in historical records to establish the times when there were certain events, ie, to build science-based chronology. There are three main kinds of sources of knowledge of ancient history: ancient texts, the remains of material culture and traditions. The specific date of the extracted by archaeologists objects in most cases can not be found. The group of Academician A.T. Fomenko has developed and applied new statistical methods for analysis of historical texts (Chronicle, based on the intensive use of computer technology. Two major scientific results were: the majority of historical records that we know now, are duplicated (in particular, chronicles, describing the so-called "Ancient Rome" and "Middle Ages", talking about the same events; the known historical chronicles tell us about real events, separated from the present time for not more than 1000 years. It was found that chronicles describing the history of "ancient times" and "Middle Ages" and the chronicle of Chinese history and the history of various European countries do not talk about different, but about the same events. We have the attempt of a new dating of historical events and restoring the true history of human society based on new data. From the standpoint of statistical methods of historical records and images of their fragments – they are special cases of non-numeric objects of nature. Therefore, developed by the group of A.T. Fomenko computer-statistical methods are the part of non-numerical statistics. We have considered some methods of statistical analysis of chronicles applied by the group of A.T. Fomenko: correlation method of maximums; dynasties method; the method of attenuation frequency; questionnaire method codes. New chronology allows us to understand much of the battle of ideas in modern science and mass consciousness. It becomes clear the root cause of cautious

  15. Comparison of three nonparametric kriging methods for delineating heavy-metal contaminated soils

    Energy Technology Data Exchange (ETDEWEB)

    Juang, K.W.; Lee, D.Y

    2000-02-01

    The probability of pollutant concentrations greater than a cutoff value is useful for delineating hazardous areas in contaminated soils. It is essential for risk assessment and reclamation. In this study, three nonparametric kriging methods [indicator kriging, probability kriging, and kriging with the cumulative distribution function (CDF) of order statistics (CDF kriging)] were used to estimate the probability of heavy-metal concentrations lower than a cutoff value. In terms of methodology, the probability kriging estimator and CDF kriging estimator take into account the information of the order relation, which is not considered in indicator kriging. Since probability kriging has been shown to be better than indicator kriging for delineating contaminated soils, the performance of CDF kriging, which the authors propose, was compared with that of probability kriging in this study. A data set of soil Cd and Pb concentrations obtained from a 10-ha heavy-metal contaminated site in Taoyuan, Taiwan, was used. The results demonstrated that the probability kriging and CDF kriging estimations were more accurate than the indicator kriging estimation. On the other hand, because the probability kriging was based on the cokriging estimator, some unreliable estimates occurred in the probability kriging estimation. This indicated that probability kriging was not as robust as CDF kriging. Therefore, CDF kriging is more suitable than probability kriging for estimating the probability of heavy-metal concentrations lower than a cutoff value.

  16. Using continuous time stochastic modelling and nonparametric statistics to improve the quality of first principles models

    DEFF Research Database (Denmark)

    A methodology is presented that combines modelling based on first principles and data based modelling into a modelling cycle that facilitates fast decision-making based on statistical methods. A strong feature of this methodology is that given a first principles model along with process data, the......, the corresponding modelling cycle model of the given system for a given purpose. A computer-aided tool, which integrates the elements of the modelling cycle, is also presented, and an example is given of modelling a fed-batch bioreactor....

  17. A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.

    Science.gov (United States)

    Mircioiu, Constantin; Atkinson, Jeffrey

    2017-05-10

    A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.

  18. Statistical methods for environmental pollution monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, R.O.

    1987-01-01

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.

  19. Statistical Methods for Environmental Pollution Monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    1987-01-01

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.

  20. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...

  1. Longitudinal data analysis a handbook of modern statistical methods

    CERN Document Server

    Fitzmaurice, Garrett; Verbeke, Geert; Molenberghs, Geert

    2008-01-01

    Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint

  2. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...

  3. Applied Mathematics in the Humanities: Review of Nonparametric Statistics for the Behavioral Sciences by Sidney Siegel and N. John Castellan, Jr. (2nd ed., 1988

    Directory of Open Access Journals (Sweden)

    Paul H. Grawe

    2016-01-01

    Full Text Available Sydney Siegel and N. John Castellan, Jr. Nonparametric Statistics for the Behavioral Sciences, Second Edition (New York NY: McGraw Hill, 1988. 399 pp. ISBN: 9780070573574. Almost 60 years ago, Sidney Siegel wrote a stellar book helping anyone in academe to use nonparametric statistics, but ironically, 60 years after that achievement, American higher education confesses itself to be in the worst Quantitative Teaching Crisis of all time. The key clue to solving that crisis may be in Siegel and Castellan’s title, Nonparametric Statistics for the Behavioral Sciences, which quietly and perhaps unconsciously excludes the Humanities. Yet it is in humanistic realities that students read, write, and think. This book review considers what could be done if the Humanities were made aware of the enormous power of nonparametric statistics for advancing both their disciplines and their students’ ability to think quantitatively. A potentially revolutionary, humanistic, nonparametric finding is considered in detail along with a brief account of tens of humanistic discoveries deriving from Siegel and Castellan’s impetus.

  4. Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds

    Directory of Open Access Journals (Sweden)

    Jun Zhang

    2013-12-01

    Full Text Available Divergence functions are the non-symmetric “distance” on the manifold, Μθ, of parametric probability density functions over a measure space, (Χ,μ. Classical information geometry prescribes, on Μθ: (i a Riemannian metric given by the Fisher information; (ii a pair of dual connections (giving rise to the family of α-connections that preserve the metric under parallel transport by their joint actions; and (iii a family of divergence functions ( α-divergence defined on Μθ x Μθ, which induce the metric and the dual connections. Here, we construct an extension of this differential geometric structure from Μθ (that of parametric probability density functions to the manifold, Μ, of non-parametric functions on X, removing the positivity and normalization constraints. The generalized Fisher information and α-connections on M are induced by an α-parameterized family of divergence functions, reflecting the fundamental convex inequality associated with any smooth and strictly convex function. The infinite-dimensional manifold, M, has zero curvature for all these α-connections; hence, the generally non-zero curvature of M can be interpreted as arising from an embedding of Μθ into Μ. Furthermore, when a parametric model (after a monotonic scaling forms an affine submanifold, its natural and expectation parameters form biorthogonal coordinates, and such a submanifold is dually flat for α = ± 1, generalizing the results of Amari’s α-embedding. The present analysis illuminates two different types of duality in information geometry, one concerning the referential status of a point (measurable function expressed in the divergence function (“referential duality” and the other concerning its representation under an arbitrary monotone scaling (“representational duality”.

  5. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  6. Inferential, non-parametric statistics to assess the quality of probabilistic forecast systems

    NARCIS (Netherlands)

    Maia, A.H.N.; Meinke, H.B.; Lennox, S.; Stone, R.C.

    2007-01-01

    Many statistical forecast systems are available to interested users. To be useful for decision making, these systems must be based on evidence of underlying mechanisms. Once causal connections between the mechanism and its statistical manifestation have been firmly established, the forecasts must al

  7. Statistical methods for forecasting

    CERN Document Server

    Abraham, Bovas

    2009-01-01

    The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists."This book, it must be said, lives up to the words on its advertising cover: ''Bridging the gap between introductory, descriptive approaches and highly advanced theoretical treatises, it provides a practical, intermediate level discussion of a variety of forecasting tools, and explains how they relate to one another, both in theory and practice.'' It does just that!"-Journal of the Royal Statistical Society"A well-written work that deals with statistical methods and models that can be used to produce short-term forecasts, this book has wide-ranging applications. It could be used in the context of a study of regression, forecasting, and time series ...

  8. Incorporating Nonparametric Statistics into Delphi Studies in Library and Information Science

    Science.gov (United States)

    Ju, Boryung; Jin, Tao

    2013-01-01

    Introduction: The Delphi technique is widely used in library and information science research. However, many researchers in the field fail to employ standard statistical tests when using this technique. This makes the technique vulnerable to criticisms of its reliability and validity. The general goal of this article is to explore how…

  9. An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies.

    Science.gov (United States)

    Bhattacharya, Rabi; Lin, Lizhen

    2010-12-01

    We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F(-1) achieve the optimal rate O(N(-4/5)). Also, we compute the asymptotic distribution of the estimate ζ~p of the effective dosage ζ(p) = F(-1) (p) which is shown to have an optimally small asymptotic variance.

  10. Non-parametric group-level statistics for source-resolved ERP analysis.

    Science.gov (United States)

    Lee, Clement; Miyakoshi, Makoto; Delorme, Arnaud; Cauwenberghs, Gert; Makeig, Scott

    2015-01-01

    We have developed a new statistical framework for group-level event-related potential (ERP) analysis in EEGLAB. The framework calculates the variance of scalp channel signals accounted for by the activity of homogeneous clusters of sources found by independent component analysis (ICA). When ICA data decomposition is performed on each subject's data separately, functionally equivalent ICs can be grouped into EEGLAB clusters. Here, we report a new addition (statPvaf) to the EEGLAB plug-in std_envtopo to enable inferential statistics on main effects and interactions in event related potentials (ERPs) of independent component (IC) processes at the group level. We demonstrate the use of the updated plug-in on simulated and actual EEG data.

  11. A Java program for non-parametric statistic comparison of community structure

    Directory of Open Access Journals (Sweden)

    WenJun Zhang

    2011-09-01

    Full Text Available The Java algorithm to statistically compare structure difference of two communities was presented in this study. Euclidean distance, Manhattan distance, Pearson correlation, Point correlation, quadratic correlation and Jaccard coefficient were included in the algorithm. The algorithm was used to compare rice arthropod communities in Pearl River Delta, China, and the results showed that the family composition of arthropods for Guangzhou, Zhongshan, Zhuhai, and Dongguan are not significantly different.

  12. A web application for evaluating Phase I methods using a non-parametric optimal benchmark.

    Science.gov (United States)

    Wages, Nolan A; Varhegyi, Nikole

    2017-06-01

    In evaluating the performance of Phase I dose-finding designs, simulation studies are typically conducted to assess how often a method correctly selects the true maximum tolerated dose under a set of assumed dose-toxicity curves. A necessary component of the evaluation process is to have some concept for how well a design can possibly perform. The notion of an upper bound on the accuracy of maximum tolerated dose selection is often omitted from the simulation study, and the aim of this work is to provide researchers with accessible software to quickly evaluate the operating characteristics of Phase I methods using a benchmark. The non-parametric optimal benchmark is a useful theoretical tool for simulations that can serve as an upper limit for the accuracy of maximum tolerated dose identification based on a binary toxicity endpoint. It offers researchers a sense of the plausibility of a Phase I method's operating characteristics in simulation. We have developed an R shiny web application for simulating the benchmark. The web application has the ability to quickly provide simulation results for the benchmark and requires no programming knowledge. The application is free to access and use on any device with an Internet browser. The application provides the percentage of correct selection of the maximum tolerated dose and an accuracy index, operating characteristics typically used in evaluating the accuracy of dose-finding designs. We hope this software will facilitate the use of the non-parametric optimal benchmark as an evaluation tool in dose-finding simulation.

  13. Comparison of Parametric and Nonparametric Methods for Analyzing the Bias of a Numerical Model

    Directory of Open Access Journals (Sweden)

    Isaac Mugume

    2016-01-01

    Full Text Available Numerical models are presently applied in many fields for simulation and prediction, operation, or research. The output from these models normally has both systematic and random errors. The study compared January 2015 temperature data for Uganda as simulated using the Weather Research and Forecast model with actual observed station temperature data to analyze the bias using parametric (the root mean square error (RMSE, the mean absolute error (MAE, mean error (ME, skewness, and the bias easy estimate (BES and nonparametric (the sign test, STM methods. The RMSE normally overestimates the error compared to MAE. The RMSE and MAE are not sensitive to direction of bias. The ME gives both direction and magnitude of bias but can be distorted by extreme values while the BES is insensitive to extreme values. The STM is robust for giving the direction of bias; it is not sensitive to extreme values but it does not give the magnitude of bias. The graphical tools (such as time series and cumulative curves show the performance of the model with time. It is recommended to integrate parametric and nonparametric methods along with graphical methods for a comprehensive analysis of bias of a numerical model.

  14. A NEW DE-NOISING METHOD BASED ON 3-BAND WAVELET AND NONPARAMETRIC ADAPTIVE ESTIMATION

    Institute of Scientific and Technical Information of China (English)

    Li Li; Peng Yuhua; Yang Mingqiang; Xue Peijun

    2007-01-01

    Wavelet de-noising has been well known as an important method of signal de-noising.Recently,most of the research efforts about wavelet de-noising focus on how to select the threshold,where Donoho method is applied widely.Compared with traditional 2-band wavelet,3-band wavelet has advantages in many aspects.According to this theory,an adaptive signal de-noising method in 3-band wavelet domain based on nonparametric adaptive estimation is proposed.The experimental results show that in 3-band wavelet domain,the proposed method represents better characteristics than Donoho method in protecting detail and improving the signal-to-noise ratio of reconstruction signal.

  15. Hadron energy reconstruction for the ATLAS calorimetry in the framework of the nonparametrical method

    CERN Document Server

    Akhmadaliev, S Z; Ambrosini, G; Amorim, A; Anderson, K; Andrieux, M L; Aubert, Bernard; Augé, E; Badaud, F; Baisin, L; Barreiro, F; Battistoni, G; Bazan, A; Bazizi, K; Belymam, A; Benchekroun, D; Berglund, S R; Berset, J C; Blanchot, G; Bogush, A A; Bohm, C; Boldea, V; Bonivento, W; Bosman, M; Bouhemaid, N; Breton, D; Brette, P; Bromberg, C; Budagov, Yu A; Burdin, S V; Calôba, L P; Camarena, F; Camin, D V; Canton, B; Caprini, M; Carvalho, J; Casado, M P; Castillo, M V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Chadelas, R; Chalifour, M; Chekhtman, A; Chevalley, J L; Chirikov-Zorin, I E; Chlachidze, G; Citterio, M; Cleland, W E; Clément, C; Cobal, M; Cogswell, F; Colas, Jacques; Collot, J; Cologna, S; Constantinescu, S; Costa, G; Costanzo, D; Crouau, M; Daudon, F; David, J; David, M; Davidek, T; Dawson, J; De, K; de La Taille, C; Del Peso, J; Del Prete, T; de Saintignon, P; Di Girolamo, B; Dinkespiler, B; Dita, S; Dodd, J; Dolejsi, J; Dolezal, Z; Downing, R; Dugne, J J; Dzahini, D; Efthymiopoulos, I; Errede, D; Errede, S; Evans, H; Eynard, G; Fassi, F; Fassnacht, P; Ferrari, A; Ferrer, A; Flaminio, Vincenzo; Fournier, D; Fumagalli, G; Gallas, E; Gaspar, M; Giakoumopoulou, V; Gianotti, F; Gildemeister, O; Giokaris, N; Glagolev, V; Glebov, V Yu; Gomes, A; González, V; González de la Hoz, S; Grabskii, V; Graugès-Pous, E; Grenier, P; Hakopian, H H; Haney, M; Hébrard, C; Henriques, A; Hervás, L; Higón, E; Holmgren, Sven Olof; Hostachy, J Y; Hoummada, A; Huston, J; Imbault, D; Ivanyushenkov, Yu M; Jézéquel, S; Johansson, E K; Jon-And, K; Jones, R; Juste, A; Kakurin, S; Karyukhin, A N; Khokhlov, Yu A; Khubua, J I; Klioukhine, V I; Kolachev, G M; Kopikov, S V; Kostrikov, M E; Kozlov, V; Krivkova, P; Kukhtin, V V; Kulagin, M; Kulchitskii, Yu A; Kuzmin, M V; Labarga, L; Laborie, G; Lacour, D; Laforge, B; Lami, S; Lapin, V; Le Dortz, O; Lefebvre, M; Le Flour, T; Leitner, R; Leltchouk, M; Li, J; Liablin, M V; Linossier, O; Lissauer, D; Lobkowicz, F; Lokajícek, M; Lomakin, Yu F; López-Amengual, J M; Lund-Jensen, B; Maio, A; Makowiecki, D S; Malyukov, S N; Mandelli, L; Mansoulié, B; Mapelli, Livio P; Marin, C P; Marrocchesi, P S; Marroquim, F; Martin, P; Maslennikov, A L; Massol, N; Mataix, L; Mazzanti, M; Mazzoni, E; Merritt, F S; Michel, B; Miller, R; Minashvili, I A; Miralles, L; Mnatzakanian, E A; Monnier, E; Montarou, G; Mornacchi, Giuseppe; Moynot, M; Muanza, G S; Nayman, P; Némécek, S; Nessi, Marzio; Nicoleau, S; Niculescu, M; Noppe, J M; Onofre, A; Pallin, D; Pantea, D; Paoletti, R; Park, I C; Parrour, G; Parsons, J; Pereira, A; Perini, L; Perlas, J A; Perrodo, P; Pilcher, J E; Pinhão, J; Plothow-Besch, Hartmute; Poggioli, Luc; Poirot, S; Price, L; Protopopov, Yu; Proudfoot, J; Puzo, P; Radeka, V; Rahm, David Charles; Reinmuth, G; Renzoni, G; Rescia, S; Resconi, S; Richards, R; Richer, J P; Roda, C; Rodier, S; Roldán, J; Romance, J B; Romanov, V; Romero, P; Rossel, F; Rusakovitch, N A; Sala, P; Sanchis, E; Sanders, H; Santoni, C; Santos, J; Sauvage, D; Sauvage, G; Sawyer, L; Says, L P; Schaffer, A C; Schwemling, P; Schwindling, J; Seguin-Moreau, N; Seidl, W; Seixas, J M; Selldén, B; Seman, M; Semenov, A; Serin, L; Shaldaev, E; Shochet, M J; Sidorov, V; Silva, J; Simaitis, V J; Simion, S; Sissakian, A N; Snopkov, R; Söderqvist, J; Solodkov, A A; Soloviev, A; Soloviev, I V; Sonderegger, P; Soustruznik, K; Spanó, F; Spiwoks, R; Stanek, R; Starchenko, E A; Stavina, P; Stephens, R; Suk, M; Surkov, A; Sykora, I; Takai, H; Tang, F; Tardell, S; Tartarelli, F; Tas, P; Teiger, J; Thaler, J; Thion, J; Tikhonov, Yu A; Tisserant, S; Tokar, S; Topilin, N D; Trka, Z; Turcotte, M; Valkár, S; Varanda, M J; Vartapetian, A H; Vazeille, F; Vichou, I; Vinogradov, V; Vorozhtsov, S B; Vuillemin, V; White, A; Wielers, M; Wingerter-Seez, I; Wolters, H; Yamdagni, N; Yosef, C; Zaitsev, A; Zitoun, R; Zolnierowski, Y

    2002-01-01

    This paper discusses hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter (consisting of a lead-liquid argon electromagnetic part and an iron-scintillator hadronic part) in the framework of the nonparametrical method. The nonparametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to an easy use in a first level trigger. The reconstructed mean values of the hadron energies are within +or-1% of the true values and the fractional energy resolution is [(58+or-3)%/ square root E+(2.5+or-0.3)%](+)(1.7+or-0.2)/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74+or-0.04 and agrees with the prediction that e/h >1.66 for this electromagnetic calorimeter. Results of a study of the longitudinal hadronic shower development are also presented. The data have been taken in the H8 beam...

  16. Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.

    Science.gov (United States)

    Zhang, Tingting; Kou, S C

    2010-01-01

    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.

  17. Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods.

    Science.gov (United States)

    Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva

    2017-06-01

    Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.

  18. Spatial Modeling of Rainfall Patterns over the Ebro River Basin Using Multifractality and Non-Parametric Statistical Techniques

    Directory of Open Access Journals (Sweden)

    José L. Valencia

    2015-11-01

    Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.

  19. A Statistical Nonparametric Approach of Face Recognition: Combination of Eigenface & Modified k-Means Clustering

    CERN Document Server

    Bag, Soumen; Sen, Prithwiraj; Sanyal, Gautam

    2011-01-01

    Facial expressions convey non-verbal cues, which play an important role in interpersonal relations. Automatic recognition of human face based on facial expression can be an important component of natural human-machine interface. It may also be used in behavioural science. Although human can recognize the face practically without any effort, but reliable face recognition by machine is a challenge. This paper presents a new approach for recognizing the face of a person considering the expressions of the same human face at different instances of time. This methodology is developed combining Eigenface method for feature extraction and modified k-Means clustering for identification of the human face. This method endowed the face recognition without using the conventional distance measure classifiers. Simulation results show that proposed face recognition using perception of k-Means clustering is useful for face images with different facial expressions.

  20. Non-parametric method for separating domestic hot water heating spikes and space heating

    DEFF Research Database (Denmark)

    Bacher, Peder; de Saint-Aubain, Philip Anton; Christiansen, Lasse Engbo;

    2016-01-01

    In this paper a method for separating spikes from a noisy data series, where the data change and evolve over time, is presented. The method is applied on measurements of the total heat load for a single family house. It relies on the fact that the domestic hot water heating is a process generating...... short-lived spikes in the time series, while the space heating changes in slower patterns during the day dependent on the climate and user behavior. The challenge is to separate the domestic hot water heating spikes from the space heating without affecting the natural noise in the space heating...... measurements. The assumption behind the developed method is that the space heating can be estimated by a non-parametric kernel smoother, such that every value significantly above this kernel smoother estimate is identified as a domestic hot water heating spike. First, it is showed how a basic kernel smoothing...

  1. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  2. [Nonparametric method of estimating survival functions containing right-censored and interval-censored data].

    Science.gov (United States)

    Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi

    2014-04-01

    Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.

  3. Statistical methods in nonlinear dynamics

    Indian Academy of Sciences (India)

    K P N Murthy; R Harish; S V M Satyanarayana

    2005-03-01

    Sensitivity to initial conditions in nonlinear dynamical systems leads to exponential divergence of trajectories that are initially arbitrarily close, and hence to unpredictability. Statistical methods have been found to be helpful in extracting useful information about such systems. In this paper, we review briefly some statistical methods employed in the study of deterministic and stochastic dynamical systems. These include power spectral analysis and aliasing, extreme value statistics and order statistics, recurrence time statistics, the characterization of intermittency in the Sinai disorder problem, random walk analysis of diffusion in the chaotic pendulum, and long-range correlations in stochastic sequences of symbols.

  4. Methods of statistical model estimation

    CERN Document Server

    Hilbe, Joseph

    2013-01-01

    Methods of Statistical Model Estimation examines the most important and popular methods used to estimate parameters for statistical models and provide informative model summary statistics. Designed for R users, the book is also ideal for anyone wanting to better understand the algorithms used for statistical model fitting. The text presents algorithms for the estimation of a variety of regression procedures using maximum likelihood estimation, iteratively reweighted least squares regression, the EM algorithm, and MCMC sampling. Fully developed, working R code is constructed for each method. Th

  5. A nonparametric Bayesian method of translating machine learning scores to probabilities in clinical decision support.

    Science.gov (United States)

    Connolly, Brian; Cohen, K Bretonnel; Santel, Daniel; Bayram, Ulya; Pestian, John

    2017-08-07

    Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.

  6. Non-parametric method for measuring gas inhomogeneities from X-ray observations of galaxy clusters

    CERN Document Server

    Morandi, Andrea; Cui, Wei

    2013-01-01

    We present a non-parametric method to measure inhomogeneities in the intracluster medium (ICM) from X-ray observations of galaxy clusters. Analyzing mock Chandra X-ray observations of simulated clusters, we show that our new method enables the accurate recovery of the 3D gas density and gas clumping factor profiles out to large radii of galaxy clusters. We then apply this method to Chandra X-ray observations of Abell 1835 and present the first determination of the gas clumping factor from the X-ray cluster data. We find that the gas clumping factor in Abell 1835 increases with radius and reaches ~2-3 at r=R_{200}. This is in good agreement with the predictions of hydrodynamical simulations, but it is significantly below the values inferred from recent Suzaku observations. We further show that the radially increasing gas clumping factor causes flattening of the derived entropy profile of the ICM and affects physical interpretation of the cluster gas structure, especially at the large cluster-centric radii. Our...

  7. A simple 2D non-parametric resampling statistical approach to assess confidence in species identification in DNA barcoding--an alternative to likelihood and bayesian approaches.

    Science.gov (United States)

    Jin, Qian; He, Li-Jun; Zhang, Ai-Bing

    2012-01-01

    In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.

  8. Statistical Methods in Integrative Genomics

    OpenAIRE

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and f...

  9. Bayesian Methods for Statistical Analysis

    OpenAIRE

    Puza, Borek

    2015-01-01

    Bayesian methods for statistical analysis is a book on statistical methods for analysing a wide variety of data. The book consists of 12 chapters, starting with basic concepts and covering numerous topics, including Bayesian estimation, decision theory, prediction, hypothesis testing, hierarchical models, Markov chain Monte Carlo methods, finite population inference, biased sampling and nonignorable nonresponse. The book contains many exercises, all with worked solutions, including complete c...

  10. When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

    Science.gov (United States)

    Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

    2016-01-01

    In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used

  11. Statistical Methods in Psychology Journals.

    Science.gov (United States)

    Willkinson, Leland

    1999-01-01

    Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)

  12. Statistical Methods in Psychology Journals.

    Science.gov (United States)

    Willkinson, Leland

    1999-01-01

    Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)

  13. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  14. Spatial and Spectral Nonparametric Linear Feature Extraction Method for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Jinn-Min Yang

    2016-11-01

    Full Text Available Feature extraction (FE or dimensionality reduction (DR plays quite an important role in the field of pattern recognition. Feature extraction aims to reduce the dimensionality of the high-dimensional dataset to enhance the classification accuracy and foster the classification speed, particularly when the training sample size is small, namely the small sample size (SSS problem. Remotely sensed hyperspectral images (HSIs are often with hundreds of measured features (bands which potentially provides more accurate and detailed information for classification, but it generally needs more samples to estimate parameters to achieve a satisfactory result. The cost of collecting ground-truth of remotely sensed hyperspectral scene can be considerably difficult and expensive. Therefore, FE techniques have been an important part for hyperspectral image classification. Unlike lots of feature extraction methods are based only on the spectral (band information of the training samples, some feature extraction methods integrating both spatial and spectral information of training samples show more effective results in recent years. Spatial contexture information has been proven to be useful to improve the HSI data representation and to increase classification accuracy. In this paper, we propose a spatial and spectral nonparametric linear feature extraction method for hyperspectral image classification. The spatial and spectral information is extracted for each training sample and used to design the within-class and between-class scatter matrices for constructing the feature extraction model. The experimental results on one benchmark hyperspectral image demonstrate that the proposed method obtains stable and satisfactory results than some existing spectral-based feature extraction.

  15. Rapid Statistical Methods: Part 1.

    Science.gov (United States)

    Lyon, A. J.

    1980-01-01

    Discusses some rapid statistical methods which are intended for use by physics teachers. Part one of this article gives some of the simplest and most commonly useful rapid methods. Part two gives references to the relevant theory together with some alternative and additional methods. (HM)

  16. Nonparametric confidence intervals for monotone functions

    NARCIS (Netherlands)

    Groeneboom, P.; Jongbloed, G.

    2015-01-01

    We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea

  17. Nonparametric confidence intervals for monotone functions

    NARCIS (Netherlands)

    Groeneboom, P.; Jongbloed, G.

    2015-01-01

    We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the

  18. Assessment of water quality trends in the Minnesota River using non-parametric and parametric methods

    Science.gov (United States)

    Johnson, H.O.; Gupta, S.C.; Vecchia, A.V.; Zvomuya, F.

    2009-01-01

    Excessive loading of sediment and nutrients to rivers is a major problem in many parts of the United States. In this study, we tested the non-parametric Seasonal Kendall (SEAKEN) trend model and the parametric USGS Quality of Water trend program (QWTREND) to quantify trends in water quality of the Minnesota River at Fort Snelling from 1976 to 2003. Both methods indicated decreasing trends in flow-adjusted concentrations of total suspended solids (TSS), total phosphorus (TP), and orthophosphorus (OP) and a generally increasing trend in flow-adjusted nitrate plus nitrite-nitrogen (NO3-N) concentration. The SEAKEN results were strongly influenced by the length of the record as well as extreme years (dry or wet) earlier in the record. The QWTREND results, though influenced somewhat by the same factors, were more stable. The magnitudes of trends between the two methods were somewhat different and appeared to be associated with conceptual differences between the flow-adjustment processes used and with data processing methods. The decreasing trends in TSS, TP, and OP concentrations are likely related to conservation measures implemented in the basin. However, dilution effects from wet climate or additional tile drainage cannot be ruled out. The increasing trend in NO3-N concentrations was likely due to increased drainage in the basin. Since the Minnesota River is the main source of sediments to the Mississippi River, this study also addressed the rapid filling of Lake Pepin on the Mississippi River and found the likely cause to be increased flow due to recent wet climate in the region. Copyright ?? 2009 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.

  19. Statistical methods in language processing.

    Science.gov (United States)

    Abney, Steven

    2011-05-01

    The term statistical methods here refers to a methodology that has been dominant in computational linguistics since about 1990. It is characterized by the use of stochastic models, substantial data sets, machine learning, and rigorous experimental evaluation. The shift to statistical methods in computational linguistics parallels a movement in artificial intelligence more broadly. Statistical methods have so thoroughly permeated computational linguistics that almost all work in the field draws on them in some way. There has, however, been little penetration of the methods into general linguistics. The methods themselves are largely borrowed from machine learning and information theory. We limit attention to that which has direct applicability to language processing, though the methods are quite general and have many nonlinguistic applications. Not every use of statistics in language processing falls under statistical methods as we use the term. Standard hypothesis testing and experimental design, for example, are not covered in this article. WIREs Cogni Sci 2011 2 315-322 DOI: 10.1002/wcs.111 For further resources related to this article, please visit the WIREs website.

  20. APPLICATION OF PARAMETRIC AND NON-PARAMETRIC BENCHMARKING METHODS IN COST EFFICIENCY ANALYSIS OF THE ELECTRICITY DISTRIBUTION SECTOR

    Directory of Open Access Journals (Sweden)

    Andrea Furková

    2007-06-01

    Full Text Available This paper explores the aplication of parametric and non-parametric benchmarking methods in measuring cost efficiency of Slovak and Czech electricity distribution companies. We compare the relative cost efficiency of Slovak and Czech distribution companies using two benchmarking methods: the non-parametric Data Envelopment Analysis (DEA and the Stochastic Frontier Analysis (SFA as the parametric approach. The first part of analysis was based on DEA models. Traditional cross-section CCR and BCC model were modified to cost efficiency estimation. In further analysis we focus on two versions of stochastic frontier cost functioin using panel data: MLE model and GLS model. These models have been applied to an unbalanced panel of 11 (Slovakia 3 and Czech Republic 8 regional electricity distribution utilities over a period from 2000 to 2004. The differences in estimated scores, parameters and ranking of utilities were analyzed. We observed significant differences between parametric methods and DEA approach.

  1. Nonparametric tests for censored data

    CERN Document Server

    Bagdonavicus, Vilijandas; Nikulin, Mikhail

    2013-01-01

    This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.

  2. Statistical methods for physical science

    CERN Document Server

    Stanford, John L

    1994-01-01

    This volume of Methods of Experimental Physics provides an extensive introduction to probability and statistics in many areas of the physical sciences, with an emphasis on the emerging area of spatial statistics. The scope of topics covered is wide-ranging-the text discusses a variety of the most commonly used classical methods and addresses newer methods that are applicable or potentially important. The chapter authors motivate readers with their insightful discussions, augmenting their material withKey Features* Examines basic probability, including coverage of standard distributions, time s

  3. 统计软件R在非参数统计教学中的应用%Application of Statistical Software R in the Teaching of Non-Parametric Statistics

    Institute of Scientific and Technical Information of China (English)

    王志刚; 冯利英; 刘勇

    2012-01-01

    Introduces the applieation of statistical software R in the teaching of non-parametric statistic's, which is an important branch of statistics. In particular, describes the using of software R in ex- ploratory data analysis, inferential statistics and stochastic, simulation in details. The flexihle, open-sourc, e characteristics of software R makes the data processing more efficient. This soft- ware can realize all the methods of the teaching process, and is convenient fi~r learners to opti- mize and improve based on the previous work. R software is suitable for teaching of the non- parametric statistics.%主要介绍统计软件R在统计中一个重要分支非参数统计中的应用.分别从探索性数据分析、推断统计、随机模拟三个角度介绍R软件的应用。从介绍可以看出R软件的灵活、开源的特性,使得数据处理变得更加高效、得心应手。能够通过软件实现教学环节中的所有方法,并且方便学习者在前人工作基础上对方法进行优化、改进,在非参数统计教学中选用R软件是适合的。

  4. Statistical Methods for Evolutionary Trees

    OpenAIRE

    Edwards, A. W. F.

    2009-01-01

    In 1963 and 1964, L. L. Cavalli-Sforza and A. W. F. Edwards introduced novel methods for computing evolutionary trees from genetical data, initially for human populations from blood-group gene frequencies. The most important development was their introduction of statistical methods of estimation applied to stochastic models of evolution.

  5. Statistical methods for evolutionary trees.

    Science.gov (United States)

    Edwards, A W F

    2009-09-01

    In 1963 and 1964, L. L. Cavalli-Sforza and A. W. F. Edwards introduced novel methods for computing evolutionary trees from genetical data, initially for human populations from blood-group gene frequencies. The most important development was their introduction of statistical methods of estimation applied to stochastic models of evolution.

  6. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  7. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  8. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  9. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  10. Beyond Statistical Methods – Compendium of Statistical Methods for Researchers

    Directory of Open Access Journals (Sweden)

    Ondřej Vozár

    2014-12-01

    Full Text Available Book Review: HENDL, J. Přehled statistických metod: Analýza a metaanalýza dat (Overview of Statistical Methods: Data Analysis and Metaanalysis. 4th extended edition. Prague: Portál, 2012. ISBN 978-80-262-0200-4.

  11. Revisiting the Distance Duality Relation using a non-parametric regression method

    Science.gov (United States)

    Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha

    2016-07-01

    The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).

  12. Robust statistical methods with R

    CERN Document Server

    Jureckova, Jana

    2005-01-01

    Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...

  13. Statistical methods for bioimpedance analysis

    Directory of Open Access Journals (Sweden)

    Christian Tronstad

    2014-04-01

    Full Text Available This paper gives a basic overview of relevant statistical methods for the analysis of bioimpedance measurements, with an aim to answer questions such as: How do I begin with planning an experiment? How many measurements do I need to take? How do I deal with large amounts of frequency sweep data? Which statistical test should I use, and how do I validate my results? Beginning with the hypothesis and the research design, the methodological framework for making inferences based on measurements and statistical analysis is explained. This is followed by a brief discussion on correlated measurements and data reduction before an overview is given of statistical methods for comparison of groups, factor analysis, association, regression and prediction, explained in the context of bioimpedance research. The last chapter is dedicated to the validation of a new method by different measures of performance. A flowchart is presented for selection of statistical method, and a table is given for an overview of the most important terms of performance when evaluating new measurement technology.

  14. Statistical Methods for Fuzzy Data

    CERN Document Server

    Viertl, Reinhard

    2011-01-01

    Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m

  15. Non-parametric methods – Tree and P-CFA – for the ecological evaluation and assessment of suitable aquatic habitats: A contribution to fish psychology

    Directory of Open Access Journals (Sweden)

    Andreas H. Melcher

    2012-09-01

    Full Text Available This study analyses multidimensional spawning habitat suitability of the fish species “Nase” (latin: Chondrostoma nasus. This is the first time non-parametric methods were used to better understand biotic habitat use in theory and practice. In particular, we tested (1 the Decision Tree technique, Chi-squared Automatic Interaction Detectors (CHAID, to identify specific habitat types and (2 Prediction-Configural Frequency Analysis (P-CFA to test for statistical significance. The combination of both non-parametric methods, CHAID and P-CFA, enabled the identification, prediction and interpretation of most typical significant spawning habitats, and we were also able to determine non-typical habitat types, e.g., types in contrast to antitypes. The gradual combination of these two methods underlined three significant habitat types: shaded habitat, fine and coarse substrate habitat depending on high flow velocity. The study affirmed the importance for fish species of shading and riparian vegetation along river banks. In addition, this method provides a weighting of interactions between specific habitat characteristics. The results demonstrate that efficient river restoration requires re-establishing riparian vegetation as well as the open river continuum and hydro-morphological improvements to habitats.

  16. Statistical methods in spatial genetics

    DEFF Research Database (Denmark)

    Guillot, Gilles; Leblois, Raphael; Coulon, Aurelie

    2009-01-01

    The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult...... to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only...

  17. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim

    2015-01-01

    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  18. Spectral Methods in Spatial Statistics

    Directory of Open Access Journals (Sweden)

    Kun Chen

    2014-01-01

    Full Text Available When the spatial location area increases becoming extremely large, it is very difficult, if not possible, to evaluate the covariance matrix determined by the set of location distance even for gridded stationary Gaussian process. To alleviate the numerical challenges, we construct a nonparametric estimator called periodogram of spatial version to represent the sample property in frequency domain, because periodogram requires less computational operation by fast Fourier transform algorithm. Under some regularity conditions on the process, we investigate the asymptotic unbiasedness property of periodogram as estimator of the spectral density function and achieve the convergence rate.

  19. Nonparametric identification of copula structures

    KAUST Repository

    Li, Bo

    2013-06-01

    We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.

  20. Order statistics & inference estimation methods

    CERN Document Server

    Balakrishnan, N

    1991-01-01

    The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co

  1. Nonparametric Predictive Regression

    OpenAIRE

    Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.

    2012-01-01

    A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...

  2. Non-parametric asymptotic statistics for the Palm mark distribution of \\beta-mixing marked point processes

    CERN Document Server

    Heinrich, Lothar; Schmidt, Volker

    2012-01-01

    We consider spatially homogeneous marked point patterns in an unboundedly expanding convex sampling window. Our main objective is to identify the distribution of the typical mark by constructing an asymptotic \\chi^2-goodness-of-fit test. The corresponding test statistic is based on a natural empirical version of the Palm mark distribution and a smoothed covariance estimator which turns out to be mean-square consistent. Our approach does not require independent marks and allows dependences between the mark field and the point pattern. Instead we impose a suitable \\beta-mixing condition on the underlying stationary marked point process which can be checked for a number of Poisson-based models and, in particular, in the case of geostatistical marking. Our method needs a central limit theorem for \\beta-mixing random fields which is proved by extending Bernstein's blocking technique to non-cubic index sets and seems to be of interest in its own right. By large-scale model-based simulations the performance of our t...

  3. Bayes linear statistics, theory & methods

    CERN Document Server

    Goldstein, Michael

    2007-01-01

    Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...

  4. Using and comparing two nonparametric methods (CART and RF and SPOT-HRG satellite data to predictive tree diversity distribution

    Directory of Open Access Journals (Sweden)

    SIAVASH KALBI

    2014-05-01

    Full Text Available Kalbi S, Fallah A, Hojjati SM. 2014. Using and comparing two nonparametric methods (CART and RF and SPOT-HRG satellite data to predictive tree diversity distribution. Nusantara Bioscience 6: 57-62. The prediction of spatial distributions of tree species by means of survey data has recently been used for conservation planning. Numerous methods have been developed for building species habitat suitability models. The present study was carried out to find the possible proper relationships between tree species diversity indices and SPOT-HRG reflectance values in Hyrcanian forests, North of Iran. Two different modeling techniques, Classification and Regression Trees (CART and Random Forest (RF, were fitted to the data in order to find the most successfully model. Simpson, Shannon diversity and the reciprocal of Simpson indices were used for estimating tree diversity. After collecting terrestrial information on trees in the 100 samples, the tree diversity indices were calculated in each plot. RF with determinate coefficient and RMSE from 56.3 to 63.9 and RMSE from 0.15 to 0.84 has better results than CART algorithms with determinate coefficient 42.3 to 63.3 and RMSE from 0.188 to 0.88. Overall the results showed that the SPOT-HRG satellite data and nonparametric regression could be useful for estimating tree diversity in Hyrcanian forests, North of Iran.

  5. Structuring feature space: a non-parametric method for volumetric transfer function generation.

    Science.gov (United States)

    Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S

    2009-01-01

    The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial

  6. Multiatlas segmentation as nonparametric regression.

    Science.gov (United States)

    Awate, Suyash P; Whitaker, Ross T

    2014-09-01

    This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.

  7. Statistical methods in radiation physics

    CERN Document Server

    Turner, James E; Bogard, James S

    2012-01-01

    This statistics textbook, with particular emphasis on radiation protection and dosimetry, deals with statistical solutions to problems inherent in health physics measurements and decision making. The authors begin with a description of our current understanding of the statistical nature of physical processes at the atomic level, including radioactive decay and interactions of radiation with matter. Examples are taken from problems encountered in health physics, and the material is presented such that health physicists and most other nuclear professionals will more readily understand the application of statistical principles in the familiar context of the examples. Problems are presented at the end of each chapter, with solutions to selected problems provided online. In addition, numerous worked examples are included throughout the text.

  8. Statistical inference via fiducial methods

    NARCIS (Netherlands)

    Salomé, Diemer

    1998-01-01

    In this thesis the attention is restricted to inductive reasoning using a mathematical probability model. A statistical procedure prescribes, for every theoretically possible set of data, the inference about the unknown of interest. ... Zie: Summary

  9. Statistical methods in translational medicine.

    Science.gov (United States)

    Chow, Shein-Chung; Tse, Siu-Keung; Lin, Min

    2008-12-01

    This study focuses on strategies and statistical considerations for assessment of translation in language (e.g. translation of case report forms in multinational clinical trials), information (e.g. translation of basic discoveries to the clinic) and technology (e.g. translation of Chinese diagnostic techniques to well-established clinical study endpoints) in pharmaceutical/clinical research and development. However, most of our efforts will be directed to statistical considerations for translation in information. Translational medicine has been defined as bench-to-bedside research, where a basic laboratory discovery becomes applicable to the diagnosis, treatment or prevention of a specific disease, and is brought forth by either a physicianscientist who works at the interface between the research laboratory and patient care, or by a team of basic and clinical science investigators. Statistics plays an important role in translational medicine to ensure that the translational process is accurate and reliable with certain statistical assurance. Statistical inference for the applicability of an animal model to a human model is also discussed. Strategies for selection of clinical study endpoints (e.g. absolute changes, relative changes, or responder-defined, based on either absolute or relative change) are reviewed.

  10. Statistical Methods in Translational Medicine

    Directory of Open Access Journals (Sweden)

    Shein-Chung Chow

    2008-12-01

    Full Text Available This study focuses on strategies and statistical considerations for assessment of translation in language (e.g. translation of case report forms in multinational clinical trials, information (e.g. translation of basic discoveries to the clinic and technology (e.g. translation of Chinese diagnostic techniques to well-established clinical study endpoints in pharmaceutical/clinical research and development. However, most of our efforts will be directed to statistical considerations for translation in information. Translational medicine has been defined as bench-to-bedside research, where a basic laboratory discovery becomes applicable to the diagnosis, treatment or prevention of a specific disease, and is brought forth by either a physician—scientist who works at the interface between the research laboratory and patient care, or by a team of basic and clinical science investigators. Statistics plays an important role in translational medicine to ensure that the translational process is accurate and reliable with certain statistical assurance. Statistical inference for the applicability of an animal model to a human model is also discussed. Strategies for selection of clinical study endpoints (e.g. absolute changes, relative changes, or responder-defined, based on either absolute or relative change are reviewed.

  11. Register-based statistics statistical methods for administrative data

    CERN Document Server

    Wallgren, Anders

    2014-01-01

    This book provides a comprehensive and up to date treatment of  theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi

  12. Permutation statistical methods an integrated approach

    CERN Document Server

    Berry, Kenneth J; Johnston, Janis E

    2016-01-01

    This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...

  13. Comparing non-parametric methods for ungrouping coarsely aggregated age-specific distributions

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Vaupel, James W.

    2016-01-01

    Demographers have often access to vital statistics that are less than ideal for the purpose of their research. In many instances demographic data are reported in coarse histograms, where the values given are only the summation of true latent values, thereby making detailed analysis troublesome. O...

  14. Estimating the Probability of Being the Best System: A Generalized Method and Nonparametric Hypothesis Test

    Science.gov (United States)

    2013-03-01

    Mendenhall , and Sheaffer [25]. For the remainder of this paper, however, we will make use of the Wilcoxon rank sum test for purposes of comparison with the...B. W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, 1986, p. 48. [25] D. D. Wackerly, W. Mendenhall III and R

  15. Climate Prediction through Statistical Methods

    CERN Document Server

    Akgun, Bora; Tuter, Levent; Kurnaz, Mehmet Levent

    2008-01-01

    Climate change is a reality of today. Paleoclimatic proxies and climate predictions based on coupled atmosphere-ocean general circulation models provide us with temperature data. Using Detrended Fluctuation Analysis, we are investigating the statistical connection between the climate types of the present and these local temperatures. We are relating this issue to some well-known historic climate shifts. Our main result is that the temperature fluctuations with or without a temperature scale attached to them, can be used to classify climates in the absence of other indicators such as pan evaporation and precipitation.

  16. The research of railway freight statistics system and statistical methods

    Directory of Open Access Journals (Sweden)

    Wu Hua-Wen

    2013-01-01

    Full Text Available EXT is a JavaScript framework for developing Web interfaces, this paper describes the Ext framework and its application in railway freight statistical and analyzing system and Statistical methods. the paper also analyzes the design, function, implementation and so on of the system in detail. As information technology and the requirements of railway transportation organization and operation continue to improve, railway freight statistical and analyzing system improves obviously in the index system, decision analysis and other aspects, better meeting the work requirements. It will play a more important role in the railway transport organization, management, passenger and freight marketing.

  17. Application of non-parametric bootstrap methods to estimate confidence intervals for QTL location in a beef cattle QTL experimental population.

    Science.gov (United States)

    Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F

    2002-06-01

    Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.

  18. Nonparametric Bayes analysis of social science data

    Science.gov (United States)

    Kunihama, Tsuyoshi

    Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.

  19. Statistical Methods for Unusual Count Data

    DEFF Research Database (Denmark)

    Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads

    2016-01-01

    microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...

  20. A contingency table approach to nonparametric testing

    CERN Document Server

    Rayner, JCW

    2000-01-01

    Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp

  1. Statistical methods in physical mapping

    Energy Technology Data Exchange (ETDEWEB)

    Nelson, David O. [Univ. of California, Berkeley, CA (United States)

    1995-05-01

    One of the great success stories of modern molecular genetics has been the ability of biologists to isolate and characterize the genes responsible for serious inherited diseases like fragile X syndrome, cystic fibrosis and myotonic muscular dystrophy. This dissertation concentrates on constructing high-resolution physical maps. It demonstrates how probabilistic modeling and statistical analysis can aid molecular geneticists in the tasks of planning, execution, and evaluation of physical maps of chromosomes and large chromosomal regions. The dissertation is divided into six chapters. Chapter 1 provides an introduction to the field of physical mapping, describing the role of physical mapping in gene isolation and ill past efforts at mapping chromosomal regions. The next two chapters review and extend known results on predicting progress in large mapping projects. Such predictions help project planners decide between various approaches and tactics for mapping large regions of the human genome. Chapter 2 shows how probability models have been used in the past to predict progress in mapping projects. Chapter 3 presents new results, based on stationary point process theory, for progress measures for mapping projects based on directed mapping strategies. Chapter 4 describes in detail the construction of all initial high-resolution physical map for human chromosome 19. This chapter introduces the probability and statistical models involved in map construction in the context of a large, ongoing physical mapping project. Chapter 5 concentrates on one such model, the trinomial model. This chapter contains new results on the large-sample behavior of this model, including distributional results, asymptotic moments, and detection error rates. In addition, it contains an optimality result concerning experimental procedures based on the trinomial model. The last chapter explores unsolved problems and describes future work.

  2. SOLVING PROBLEMS OF STATISTICS WITH THE METHODS OF INFORMATION THEORY

    Directory of Open Access Journals (Sweden)

    Lutsenko Y. V.

    2015-02-01

    Full Text Available The article presents a theoretical substantiation, methods of numerical calculations and software implementation of the decision of problems of statistics, in particular the study of statistical distributions, methods of information theory. On the basis of empirical data by calculation we have determined the number of observations used for the analysis of statistical distributions. The proposed method of calculating the amount of information is not based on assumptions about the independence of observations and the normal distribution, i.e., is non-parametric and ensures the correct modeling of nonlinear systems, and also allows comparable to process heterogeneous (measured in scales of different types data numeric and non-numeric nature that are measured in different units. Thus, ASC-analysis and "Eidos" system is a modern innovation (ready for implementation technology solving problems of statistical methods of information theory. This article can be used as a description of the laboratory work in the disciplines of: intelligent systems; knowledge engineering and intelligent systems; intelligent technologies and knowledge representation; knowledge representation in intelligent systems; foundations of intelligent systems; introduction to neuromaturation and methods neural networks; fundamentals of artificial intelligence; intelligent technologies in science and education; knowledge management; automated system-cognitive analysis and "Eidos" intelligent system which the author is developing currently, but also in other disciplines associated with the transformation of data into information, and its transformation into knowledge and application of this knowledge to solve problems of identification, forecasting, decision making and research of the simulated subject area (which is virtually all subjects in all fields of science

  3. [Evaluation of using statistical methods in selected national medical journals].

    Science.gov (United States)

    Sych, Z

    1996-01-01

    most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  4. Multivariate statistical methods a primer

    CERN Document Server

    Manly, Bryan FJ

    2004-01-01

    THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o

  5. ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH

    Directory of Open Access Journals (Sweden)

    I. C. A. OYEKA

    2012-02-01

    Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.

  6. Equilibrium Statistics: Monte Carlo Methods

    Science.gov (United States)

    Kröger, Martin

    Monte Carlo methods use random numbers, or ‘random’ sequences, to sample from a known shape of a distribution, or to extract distribution by other means. and, in the context of this book, to (i) generate representative equilibrated samples prior being subjected to external fields, or (ii) evaluate high-dimensional integrals. Recipes for both topics, and some more general methods, are summarized in this chapter. It is important to realize, that Monte Carlo should be as artificial as possible to be efficient and elegant. Advanced Monte Carlo ‘moves’, required to optimize the speed of algorithms for a particular problem at hand, are outside the scope of this brief introduction. One particular modern example is the wavelet-accelerated MC sampling of polymer chains [406].

  7. Statistical methods for nuclear material management

    Energy Technology Data Exchange (ETDEWEB)

    Bowen W.M.; Bennett, C.A. (eds.)

    1988-12-01

    This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems.

  8. Statistical Methods for Material Characterization and Qualification

    Energy Technology Data Exchange (ETDEWEB)

    Kercher, A.K.

    2005-04-01

    This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.

  9. Statistical methods for material characterization and qualification

    Energy Technology Data Exchange (ETDEWEB)

    Hunn, John D [ORNL; Kercher, Andrew K [ORNL

    2005-01-01

    This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.

  10. Statistical methods for material characterization and qualification

    Energy Technology Data Exchange (ETDEWEB)

    Hunn, John D [ORNL; Kercher, Andrew K [ORNL

    2005-01-01

    This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.

  11. Statistical Methods for Material Characterization and Qualification

    Energy Technology Data Exchange (ETDEWEB)

    Kercher, A.K.

    2005-04-01

    This document describes a suite of statistical methods that can be used to infer lot parameters from the data obtained from inspection/testing of random samples taken from that lot. Some of these methods will be needed to perform the statistical acceptance tests required by the Advanced Gas Reactor Fuel Development and Qualification (AGR) Program. Special focus has been placed on proper interpretation of acceptance criteria and unambiguous methods of reporting the statistical results. In addition, modified statistical methods are described that can provide valuable measures of quality for different lots of material. This document has been written for use as a reference and a guide for performing these statistical calculations. Examples of each method are provided. Uncertainty analysis (e.g., measurement uncertainty due to instrumental bias) is not included in this document, but should be considered when reporting statistical results.

  12. NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS

    Directory of Open Access Journals (Sweden)

    Orlov A. I.

    2015-10-01

    Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits

  13. Inferring the three-dimensional distribution of dust in the Galaxy with a non-parametric method: Preparing for Gaia

    CERN Document Server

    Kh., S Rezaei; Hanson, R J; Fouesneau, M

    2016-01-01

    We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line-of-sight. Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian Process to connect all lines-of-sight. We demonstrate the capability of our model to capture detailed dust density variations using mock data as well as simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Due to our smoothness constraint and its isotropy,...

  14. Comparison of non-parametric methods for ungrouping coarsely aggregated data

    DEFF Research Database (Denmark)

    Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda;

    2016-01-01

    group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five...

  15. Application of nonparametric regression and statistical testing to identify the impact of oil and natural gas development on local air quality

    Energy Technology Data Exchange (ETDEWEB)

    Pekney, Natalie J.; Cheng, Hanqi; Small, Mitchell J.

    2015-11-05

    Abstract: The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality.

  16. Statistical Models and Methods for Lifetime Data

    CERN Document Server

    Lawless, Jerald F

    2011-01-01

    Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,

  17. Multivariate statistical methods a first course

    CERN Document Server

    Marcoulides, George A

    2014-01-01

    Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin

  18. SOME STATISTICAL SOFTWARE APPLICATIONS FOR TAGUCHI METHODS

    Directory of Open Access Journals (Sweden)

    Adrian Stere PARIS

    2016-05-01

    Full Text Available The paper details the variety of Taguchi methods, as important contribution to the quality improvement. The extended use of these methods imposes more and more complex calculi for the practical application and optimization. It should be necessary to benefit by the new software developments, assisted by the advanced statistical methods. The paper presents a few particular applications of some statistical software for the Taguchi methods as a quality enhancement insisting on the quality loss functions, the design of experiments and the new developments of statistical process control.

  19. Advanced statistical methods in data science

    CERN Document Server

    Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao

    2016-01-01

    This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...

  20. Semi- and Nonparametric ARCH Processes

    Directory of Open Access Journals (Sweden)

    Oliver B. Linton

    2011-01-01

    Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.

  1. ABOUT THE METHODOLOGY OF STATISTICAL METHODS

    OpenAIRE

    Orlov A. I.

    2014-01-01

    The purpose of the article - to justify the need to develop the methodology of statistical methods as an independent scientific direction. The models of mathematician and applied specialist are presented. We have obtained the conclusions on teaching and research and discussed five major unsolved problems of statistical methods: the effect of deviations from the traditional prerequisites; use asymptotic results for finite sample sizes; selecting one of the many specific tests for the hypothesi...

  2. Modern statistical methods in respiratory medicine.

    Science.gov (United States)

    Wolfe, Rory; Abramson, Michael J

    2014-01-01

    Statistics sits right at the heart of scientific endeavour in respiratory medicine and many other disciplines. In this introductory article, some key epidemiological concepts such as representativeness, random sampling, association and causation, and confounding are reviewed. A brief introduction to basic statistics covering topics such as frequentist methods, confidence intervals, hypothesis testing, P values and Type II error is provided. Subsequent articles in this series will cover some modern statistical methods including regression models, analysis of repeated measures, causal diagrams, propensity scores, multiple imputation, accounting for measurement error, survival analysis, risk prediction, latent class analysis and meta-analysis.

  3. Spatial analysis statistics, visualization, and computational methods

    CERN Document Server

    Oyana, Tonny J

    2015-01-01

    An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis-containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS-as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statisti...

  4. Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

    Energy Technology Data Exchange (ETDEWEB)

    Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu

    2008-03-07

    We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.

  5. Workshop on Analytical Methods in Statistics

    CERN Document Server

    Jurečková, Jana; Maciak, Matúš; Pešta, Michal

    2017-01-01

    This volume collects authoritative contributions on analytical methods and mathematical statistics. The methods presented include resampling techniques; the minimization of divergence; estimation theory and regression, eventually under shape or other constraints or long memory; and iterative approximations when the optimal solution is difficult to achieve. It also investigates probability distributions with respect to their stability, heavy-tailness, Fisher information and other aspects, both asymptotically and non-asymptotically. The book not only presents the latest mathematical and statistical methods and their extensions, but also offers solutions to real-world problems including option pricing. The selected, peer-reviewed contributions were originally presented at the workshop on Analytical Methods in Statistics, AMISTAT 2015, held in Prague, Czech Republic, November 10-13, 2015.

  6. Nonparametric estimation of ultrasound pulses

    DEFF Research Database (Denmark)

    Jensen, Jørgen Arendt; Leeman, Sidney

    1994-01-01

    An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...

  7. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin

    2017-01-19

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  8. [Pathogenesis of temporomandibular dysfunction. II. Statistical method].

    Science.gov (United States)

    Vágó, P

    1989-08-01

    The variables of the epidemiologic assessments concerned with the aetiology of the mandible joint disfunction were examined in the course of statistical analyses, in general, in their pairwise connections and possibly a multi-variable linear regression calculation was employed. In the course of the examination, for establishing the linear, empirically tested model of the aetiology of the mandible joint disfunction a new type statistical method, the LISREL (Linear Structural Relationship) method was employed. An advantage of this assessment consists in that not only observed variables may figure as the variables of the structural equation but also latent variables which cannot be observed but it is supposable that they are factors of the observed variables. This statistical method is described in closer details in the article in connection with the forming of the aetiological model.

  9. Statistical methods for spatio-temporal systems

    CERN Document Server

    Finkenstadt, Barbel

    2006-01-01

    Statistical Methods for Spatio-Temporal Systems presents current statistical research issues on spatio-temporal data modeling and will promote advances in research and a greater understanding between the mechanistic and the statistical modeling communities.Contributed by leading researchers in the field, each self-contained chapter starts with an introduction of the topic and progresses to recent research results. Presenting specific examples of epidemic data of bovine tuberculosis, gastroenteric disease, and the U.K. foot-and-mouth outbreak, the first chapter uses stochastic models, such as point process models, to provide the probabilistic backbone that facilitates statistical inference from data. The next chapter discusses the critical issue of modeling random growth objects in diverse biological systems, such as bacteria colonies, tumors, and plant populations. The subsequent chapter examines data transformation tools using examples from ecology and air quality data, followed by a chapter on space-time co...

  10. Statistical Methods for Stochastic Differential Equations

    CERN Document Server

    Kessler, Mathieu; Sorensen, Michael

    2012-01-01

    The seventh volume in the SemStat series, Statistical Methods for Stochastic Differential Equations presents current research trends and recent developments in statistical methods for stochastic differential equations. Written to be accessible to both new students and seasoned researchers, each self-contained chapter starts with introductions to the topic at hand and builds gradually towards discussing recent research. The book covers Wiener-driven equations as well as stochastic differential equations with jumps, including continuous-time ARMA processes and COGARCH processes. It presents a sp

  11. Applying statistical methods to text steganography

    CERN Document Server

    Nechta, Ivan

    2011-01-01

    This paper presents a survey of text steganography methods used for hid- ing secret information inside some covertext. Widely known hiding techniques (such as translation based steganography, text generating and syntactic embed- ding) and detection are considered. It is shown that statistical analysis has an important role in text steganalysis.

  12. Statistical search methods for lotsizing problems

    NARCIS (Netherlands)

    M. Salomon (Marc); R. Kuik (Roelof); L.N. van Wassenhove (Luk)

    1993-01-01

    textabstractThis paper reports on our experiments with statistical search methods for solving lotsizing problems in production planning. In lotsizing problems the main objective is to generate a minimum cost production and inventory schedule, such that (i) customer demand is satisfied, and (ii) capa

  13. Multivariate nonparametric regression and visualization with R and applications to finance

    CERN Document Server

    Klemelä, Jussi

    2014-01-01

    A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio

  14. A novel nonparametric confidence interval for differences of proportions for correlated binary data.

    Science.gov (United States)

    Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan

    2016-11-16

    Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.

  15. Testing Equality of Nonparametric Functions in Two Partially Linear Models%检验两个部分线性模型中非参函数相等

    Institute of Scientific and Technical Information of China (English)

    施三支; 宋立新; 杨华

    2008-01-01

    We propose the test statistic to check whether the nonparametric func-tions in two partially linear models are equality or not in this paper. We estimate the nonparametric function both in null hypothesis and the alternative by the local linear method, where we ignore the parametric components, and then estimate the parameters by the two stage method. The test statistic is derived, and it is shown to be asymptotically normal under the null hypothesis.

  16. Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

    Science.gov (United States)

    Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

    2010-07-15

    With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions.

  17. Nonparametric Econometrics: The np Package

    Directory of Open Access Journals (Sweden)

    Tristen Hayfield

    2008-07-01

    Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of significance and consistent model specification tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.

  18. Quantal Response: Nonparametric Modeling

    Science.gov (United States)

    2017-01-01

    spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B

  19. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...

  20. The statistical process control methods - SPC

    Directory of Open Access Journals (Sweden)

    Floreková Ľubica

    1998-03-01

    Full Text Available Methods of statistical evaluation of quality – SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.

  1. Statistics Anxiety and Business Statistics: The International Student

    Science.gov (United States)

    Bell, James A.

    2008-01-01

    Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…

  2. Nonparametric dark energy reconstruction from supernova data.

    Science.gov (United States)

    Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David

    2010-12-10

    Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.

  3. Nonparametric estimation of employee stock options

    Institute of Scientific and Technical Information of China (English)

    FU Qiang; LIU Li-an; LIU Qian

    2006-01-01

    We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.

  4. Non-parametric estimation of Fisher information from real data

    CERN Document Server

    Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A

    2015-01-01

    The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...

  5. Statistical methods for assessment of blend homogeneity

    DEFF Research Database (Denmark)

    Madsen, Camilla

    2002-01-01

    In this thesis the use of various statistical methods to address some of the problems related to assessment of the homogeneity of powder blends in tablet production is discussed. It is not straight forward to assess the homogeneity of a powder blend. The reason is partly that in bulk materials......, it is shown how to set up parametric acceptance criteria for the batch that gives a high confidence that future samples with a probability larger than a specified value will pass the USP threeclass criteria. Properties and robustness of proposed changes to the USP test for content uniformity are investigated...

  6. 基于工业控制模型的非参数CUSUM入侵检测方法%A non-parametric CUSUM intrusion detection method based on industrial control model

    Institute of Scientific and Technical Information of China (English)

    张云贵; 赵华; 王丽娜

    2012-01-01

    To deal with the rising serious information security problem of the industrial control system (ICS) , this paper presents an intrusion detection method of the non-parametric cumulative sum (CUSUM) for industrial control network. Using the output-input dependent characteristics of the ICS, a mathematical model of the ICS is established to predict the output of the system. Once the sensors of the control system are under attack, the actual output will change. At every moment, the difference between the predicted output of the industrial control model and the measured signal by the sensors is calculated, and then the time-based statistical sequence is formed. By the non-parametric CUSUM algorithm, the online detection of the intrusion attacks is implemented and alarmed. The simulated detection experiments show that the proposed method has a good real-time and low false alarm rate. By choosing appropriate parameters r and β of the non-parametric CUSUM algorithm, the intrusion detection method can accurately detect the attacks before substantial damage to the control system and it is also helpful to monitor the misoperation.%为解决日趋严重的工业控制系统(industrial control system,ICS)信息安全问题,提出一种针对工业控制网络的非参数累积和( cumulative sum,CUSUM)入侵检测方法.利用ICS输入决定输出的特性,建立ICS的数学模型预测系统的输出,一旦控制系统的传感器遭受攻击,实际输出信号将发生改变.在每个时刻,计算工业控制模型的预测输出与传感器测量信号的差值,形成基于时间的统计序列,采用非参数CUSUM算法,实现在线检测入侵并报警.仿真检测实验证明,该方法具有良好的实时性和低误报率.选择适当的非参数CUSUM算法参数T和β,该入侵检测方法不但能在攻击对控制系统造成实质伤害前检测出攻击,还对监测ICS中的误操作有一定帮助.

  7. Toward improved statistical methods for analyzing Cotinine-Biomarker health association data

    Directory of Open Access Journals (Sweden)

    Clark John D

    2011-10-01

    Full Text Available Abstract Background Serum cotinine, a metabolite of nicotine, is frequently used in research as a biomarker of recent tobacco smoke exposure. Historically, secondhand smoke (SHS research uses suboptimal statistical methods due to censored serum cotinine values, meaning a measurement below the limit of detection (LOD. Methods We compared commonly used methods for analyzing censored serum cotinine data using parametric and non-parametric techniques employing data from the 1999-2004 National Health and Nutrition Examination Surveys (NHANES. To illustrate the differences in associations obtained by various analytic methods, we compared parameter estimates for the association between cotinine and the inflammatory marker homocysteine using complete case analysis, single and multiple imputation, "reverse" Kaplan-Meier, and logistic regression models. Results Parameter estimates and statistical significance varied according to the statistical method used with censored serum cotinine values. Single imputation of censored values with either 0, LOD or LOD/√2 yielded similar estimates and significance; multiple imputation method yielded smaller estimates than the other methods and without statistical significance. Multiple regression modelling using the "reverse" Kaplan-Meier method yielded statistically significant estimates that were larger than those from parametric methods. Conclusions Analyses of serum cotinine data with values below the LOD require special attention. "Reverse" Kaplan-Meier was the only method inherently able to deal with censored data with multiple LODs, and may be the most accurate since it avoids data manipulation needed for use with other commonly used statistical methods. Additional research is needed into the identification of optimal statistical methods for analysis of SHS biomarkers subject to a LOD.

  8. Statistical methods in credit risk management

    Directory of Open Access Journals (Sweden)

    Ljiljanka Kvesić

    2012-12-01

    Full Text Available Successful banks base their operations on the principles of liquidity, profitability and safety. Therefore, the correct assessment of the ability of a loan applicant to carry out certain obligations is of crucial importance for the functioning of a bank. In the past few decades several credit scoring models have been developed to provide support to credit analysts in the assessment of a loan applicant. This paper presents three statistical methods that are used for this purpose in the area of credit risk management: logistical regression, discriminatory analysis and survival analysis. Their implementation in the banking sector was motivated to a great extent by the development and application of information and communication technologies. This paper aims to point out the most important theoretical aspects of these methods, but also to actualise the need for the development and application of the credit scoring model in Croatian banking practice.

  9. Statistical Methods in Phylogenetic and Evolutionary Inferences

    Directory of Open Access Journals (Sweden)

    Luigi Bertolotti

    2013-05-01

    Full Text Available Molecular instruments are the most accurate methods in organisms’identification and characterization. Biologists are often involved in studies where the main goal is to identify relationships among individuals. In this framework, it is very important to know and apply the most robust approaches to infer correctly these relationships, allowing the right conclusions about phylogeny. In this review, we will introduce the reader to the most used statistical methods in phylogenetic analyses, the Maximum Likelihood and the Bayesian approaches, considering for simplicity only analyses regardingDNA sequences. Several studieswill be showed as examples in order to demonstrate how the correct phylogenetic inference can lead the scientists to highlight very peculiar features in pathogens biology and evolution.

  10. Nonparametric inference of network structure and dynamics

    Science.gov (United States)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  11. Statistical Methods and Software for the Analysis of Occupational Exposure Data with Non-detectable Values

    Energy Technology Data Exchange (ETDEWEB)

    Frome, EL

    2005-09-20

    Environmental exposure measurements are, in general, positive and may be subject to left censoring; i.e,. the measured value is less than a ''detection limit''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. Parametric methods used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level, an upper percentile, and the exceedance fraction are used to characterize exposure levels, and confidence limits are used to describe the uncertainty in these estimates. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on an upper percentile (i.e., the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly enhanced the availability of high-quality nonproprietary (open source) software that serves as the basis for implementing the methods in this paper.

  12. Application of pedagogy reflective in statistical methods course and practicum statistical methods

    Science.gov (United States)

    Julie, Hongki

    2017-08-01

    Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.

  13. Nonparametric regression with filtered data

    CERN Document Server

    Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260

    2011-01-01

    We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.

  14. On two methods of statistical image analysis

    NARCIS (Netherlands)

    Missimer, J; Knorr, U; Maguire, RP; Herzog, H; Seitz, RJ; Tellman, L; Leenders, KL

    1999-01-01

    The computerized brain atlas (CBA) and statistical parametric mapping (SPM) are two procedures for voxel-based statistical evaluation of PET activation studies. Each includes spatial standardization of image volumes, computation of a statistic, and evaluation of its significance. In addition, smooth

  15. The Monte Carlo method the method of statistical trials

    CERN Document Server

    Shreider, YuA

    1966-01-01

    The Monte Carlo Method: The Method of Statistical Trials is a systematic account of the fundamental concepts and techniques of the Monte Carlo method, together with its range of applications. Some of these applications include the computation of definite integrals, neutron physics, and in the investigation of servicing processes. This volume is comprised of seven chapters and begins with an overview of the basic features of the Monte Carlo method and typical examples of its application to simple problems in computational mathematics. The next chapter examines the computation of multi-dimensio

  16. Statistical methods for astronomical data analysis

    CERN Document Server

    Chattopadhyay, Asis Kumar

    2014-01-01

    This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...

  17. Nonparametric estimation of Fisher information from real data

    Science.gov (United States)

    Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.

    2016-02-01

    The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.

  18. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    Science.gov (United States)

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  19. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    Science.gov (United States)

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  20. Comparação de duas metodologias de amostragem atmosférica com ferramenta estatística não paramétrica Comparison of two atmospheric sampling methodologies with non-parametric statistical tools

    Directory of Open Access Journals (Sweden)

    Maria João Nunes

    2005-03-01

    Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.

  1. An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

    Science.gov (United States)

    Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth

    2015-10-01

    Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations

  2. Benchmark of the non-parametric Bayesian deconvolution method implemented in the SINBAD code for X/γ rays spectra processing

    Science.gov (United States)

    Rohée, E.; Coulon, R.; Carrel, F.; Dautremer, T.; Barat, E.; Montagu, T.; Normand, S.; Jammes, C.

    2016-11-01

    Radionuclide identification and quantification are a serious concern for many applications as for in situ monitoring at nuclear facilities, laboratory analysis, special nuclear materials detection, environmental monitoring, and waste measurements. High resolution gamma-ray spectrometry based on high purity germanium diode detectors is the best solution available for isotopic identification. Over the last decades, methods have been developed to improve gamma spectra analysis. However, some difficulties remain in the analysis when full energy peaks are folded together with high ratio between their amplitudes, and when the Compton background is much larger compared to the signal of a single peak. In this context, this study deals with the comparison between a conventional analysis based on "iterative peak fitting deconvolution" method and a "nonparametric Bayesian deconvolution" approach developed by the CEA LIST and implemented into the SINBAD code. The iterative peak fit deconvolution is used in this study as a reference method largely validated by industrial standards to unfold complex spectra from HPGe detectors. Complex cases of spectra are studied from IAEA benchmark protocol tests and with measured spectra. The SINBAD code shows promising deconvolution capabilities compared to the conventional method without any expert parameter fine tuning.

  3. Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures

    Science.gov (United States)

    Li, Quanbao; Wei, Fajie; Zhou, Shenghan

    2017-05-01

    The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.

  4. Development of a Research Methods and Statistics Concept Inventory

    Science.gov (United States)

    Veilleux, Jennifer C.; Chapman, Kate M.

    2017-01-01

    Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…

  5. A nonparametric dynamic additive regression model for longitudinal data

    DEFF Research Database (Denmark)

    Martinussen, Torben; Scheike, Thomas H.

    2000-01-01

    dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...

  6. METHODS TO RESTRUCTURE THE STATISTICAL COMMUNITIES

    Directory of Open Access Journals (Sweden)

    Emilia TITAN

    2005-12-01

    Full Text Available In view of knowing the essence of phenomena it is necessary to perform statistical data processing operations. This allows for shifting from individual data to derived, synthetic indicators that highlight the essence of various phenomena. The high volume and diversity of processing operations presuppose developing plans of computerised data processing. To identify distinct and homogenous groups and classes it is necessary to realise well-pondered groupings and classifications that presuppose to comply with the requirements presented in the article.

  7. Statistical models and methods for reliability and survival analysis

    CERN Document Server

    Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo

    2013-01-01

    Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical

  8. Statistical time series methods for damage diagnosis in a scale aircraft skeleton structure: loosened bolts damage scenarios

    Energy Technology Data Exchange (ETDEWEB)

    Kopsaftopoulos, Fotis P; Fassois, Spilios D, E-mail: fkopsaf@mech.upatras.gr, E-mail: fassois@mech.upatras.gr [Stochastic Mechanical Systems and Automation (SMSA) Laboratory Department of Mechanical and Aeronautical Engineering University of Patras, GR 265 00 Patras (Greece)

    2011-07-19

    A comparative assessment of several vibration based statistical time series methods for Structural Health Monitoring (SHM) is presented via their application to a scale aircraft skeleton laboratory structure. A brief overview of the methods, which are either scalar or vector type, non-parametric or parametric, and pertain to either the response-only or excitation-response cases, is provided. Damage diagnosis, including both the detection and identification subproblems, is tackled via scalar or vector vibration signals. The methods' effectiveness is assessed via repeated experiments under various damage scenarios, with each scenario corresponding to the loosening of one or more selected bolts. The results of the study confirm the 'global' damage detection capability and effectiveness of statistical time series methods for SHM.

  9. Statistical methods of estimating mining costs

    Science.gov (United States)

    Long, K.R.

    2011-01-01

    Until it was defunded in 1995, the U.S. Bureau of Mines maintained a Cost Estimating System (CES) for prefeasibility-type economic evaluations of mineral deposits and estimating costs at producing and non-producing mines. This system had a significant role in mineral resource assessments to estimate costs of developing and operating known mineral deposits and predicted undiscovered deposits. For legal reasons, the U.S. Geological Survey cannot update and maintain CES. Instead, statistical tools are under development to estimate mining costs from basic properties of mineral deposits such as tonnage, grade, mineralogy, depth, strip ratio, distance from infrastructure, rock strength, and work index. The first step was to reestimate "Taylor's Rule" which relates operating rate to available ore tonnage. The second step was to estimate statistical models of capital and operating costs for open pit porphyry copper mines with flotation concentrators. For a sample of 27 proposed porphyry copper projects, capital costs can be estimated from three variables: mineral processing rate, strip ratio, and distance from nearest railroad before mine construction began. Of all the variables tested, operating costs were found to be significantly correlated only with strip ratio.

  10. An alternative approach to the ground motion prediction problem by a non-parametric adaptive regression method

    Science.gov (United States)

    Yerlikaya-Özkurt, Fatma; Askan, Aysegul; Weber, Gerhard-Wilhelm

    2014-12-01

    Ground Motion Prediction Equations (GMPEs) are empirical relationships which are used for determining the peak ground response at a particular distance from an earthquake source. They relate the peak ground responses as a function of earthquake source type, distance from the source, local site conditions where the data are recorded and finally the depth and magnitude of the earthquake. In this article, a new prediction algorithm, called Conic Multivariate Adaptive Regression Splines (CMARS), is employed on an available dataset for deriving a new GMPE. CMARS is based on a special continuous optimization technique, conic quadratic programming. These convex optimization problems are very well-structured, resembling linear programs and, hence, permitting the use of interior point methods. The CMARS method is performed on the strong ground motion database of Turkey. Results are compared with three other GMPEs. CMARS is found to be effective for ground motion prediction purposes.

  11. Testing for constant nonparametric effects in general semiparametric regression models with interactions

    KAUST Repository

    Wei, Jiawei

    2011-07-01

    We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.

  12. Testing for Constant Nonparametric Effects in General Semiparametric Regression Models with Interactions.

    Science.gov (United States)

    Wei, Jiawei; Carroll, Raymond J; Maity, Arnab

    2011-07-01

    We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.

  13. Innovative statistical methods for public health data

    CERN Document Server

    Wilson, Jeffrey

    2015-01-01

    The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference, and it can be used in graduate level classes.

  14. Methods of contemporary mathematical statistical physics

    CERN Document Server

    2009-01-01

    This volume presents a collection of courses introducing the reader to the recent progress with attention being paid to laying solid grounds and developing various basic tools. An introductory chapter on lattice spin models is useful as a background for other lectures of the collection. The topics include new results on phase transitions for gradient lattice models (with introduction to the techniques of the reflection positivity), stochastic geometry reformulation of classical and quantum Ising models, the localization/delocalization transition for directed polymers. A general rigorous framework for theory of metastability is presented and particular applications in the context of Glauber and Kawasaki dynamics of lattice models are discussed. A pedagogical account of several recently discussed topics in nonequilibrium statistical mechanics with an emphasis on general principles is followed by a discussion of kinetically constrained spin models that are reflecting important peculiar features of glassy dynamic...

  15. Mathematical and statistical methods for multistatic imaging

    CERN Document Server

    Ammari, Habib; Jing, Wenjia; Kang, Hyeonbae; Lim, Mikyoung; Sølna, Knut; Wang, Han

    2013-01-01

    This book covers recent mathematical, numerical, and statistical approaches for multistatic imaging of targets with waves at single or multiple frequencies. The waves can be acoustic, elastic or electromagnetic. They are generated by point sources on a transmitter array and measured on a receiver array. An important problem in multistatic imaging is to quantify and understand the trade-offs between data size, computational complexity, signal-to-noise ratio, and resolution. Another fundamental problem is to have a shape representation well suited to solving target imaging problems from multistatic data. In this book the trade-off between resolution and stability when the data are noisy is addressed. Efficient imaging algorithms are provided and their resolution and stability with respect to noise in the measurements analyzed. It also shows that high-order polarization tensors provide an accurate representation of the target. Moreover, a dictionary-matching technique based on new invariants for the generalized ...

  16. Statistical methods for categorical data analysis

    CERN Document Server

    Powers, Daniel

    2008-01-01

    This book provides a comprehensive introduction to methods and models for categorical data analysis and their applications in social science research. Companion website also available, at https://webspace.utexas.edu/dpowers/www/

  17. Simple statistical methods for software engineering data and patterns

    CERN Document Server

    Pandian, C Ravindranath

    2015-01-01

    Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. Simple Statistical Methods for Software Engineering: Data and Patterns fills that void. Instead of delving into overly complex statistics, the book details simpler solutions that are just as effective and connect with the intuition of problem solvers.Sharing valuable insights into software engineering problems and solutions, the book not only explains the required statistical methods, but also provides many examples, review questions, and case studies that prov

  18. Nonparametric statistical testing of coherence differences

    NARCIS (Netherlands)

    Maris, E.; Schoffelen, J.M.; Fries, P.

    2007-01-01

    Many important questions in neuroscience are about interactions between neurons or neuronal groups. These interactions are often quantified by coherence, which is a frequency-indexed measure that quantifies the extent to which two signals exhibit a consistent phase relation. In this paper, we consid

  19. Non-parametric linear regression of discrete Fourier transform convoluted chromatographic peak responses under non-ideal conditions of internal standard method.

    Science.gov (United States)

    Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A

    2010-11-15

    This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and

  20. Statistical methods and computing for big data

    Science.gov (United States)

    Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing

    2016-01-01

    Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593

  1. Thirty years of nonparametric item response theory

    NARCIS (Netherlands)

    Molenaar, W.

    2001-01-01

    Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo

  2. On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

    Directory of Open Access Journals (Sweden)

    Aaditya Ramdas

    2017-01-01

    Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

  3. Statistical methods for analysing complex genetic traits

    NARCIS (Netherlands)

    El Galta, Rachid

    2006-01-01

    Complex traits are caused by multiple genetic and environmental factors, and are therefore difficult to study compared with simple Mendelian diseases. The modes of inheritance of Mendelian diseases are often known. Methods to dissect such diseases are well described in literature. For complex geneti

  4. Analysis of Statistical Methods Currently used in Toxicology Journals

    OpenAIRE

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-01-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and in...

  5. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.

    Science.gov (United States)

    Fan, Jianqing; Ma, Yunbei; Dai, Wei

    2014-01-01

    The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.

  6. Problems and Recommendations for Rural Statistics and Survey Methods

    Institute of Scientific and Technical Information of China (English)

    Chengjun; ZHANG

    2014-01-01

    With constant deepening of the reform and opening-up,national economic system has changed from planned economy to market economy,and rural survey and statistics remain in a difficult transition period. In this period,China needs transforming original statistical mode according to market economic system. All levels of government should report and submit a lot and increasing statistical information. Besides,in this period,townships,villages and counties are faced with old and new conflicts. These conflicts perplex implementation of rural statistics and survey and development of rural statistical undertaking,and also cause researches and thinking of reform of rural statistical and survey methods.

  7. Statistical Methods Used in Gifted Education Journals, 2006-2010

    Science.gov (United States)

    Warne, Russell T.; Lazo, Maria; Ramos, Tammy; Ritter, Nicola

    2012-01-01

    This article describes the statistical methods used in quantitative and mixed methods articles between 2006 and 2010 in five gifted education research journals. Results indicate that the most commonly used statistical methods are means (85.9% of articles), standard deviations (77.8%), Pearson's "r" (47.8%), X[superscript 2] (32.2%), ANOVA (30.7%),…

  8. Statistical methods for assessment of blend homogeneity

    DEFF Research Database (Denmark)

    Madsen, Camilla

    2002-01-01

    as powder blends there is no natural unit or amount to define a sample from the blend, and partly that current technology does not provide a method of universally collecting small representative samples from large static powder beds. In the thesis a number of methods to assess (in)homogeneity are presented...... of internal factors to the blend e.g. the particle size distribution. The relation between particle size distribution and the variation in drug content in blend and tablet samples is discussed. A central problem is to develop acceptance criteria for blends and tablet batches to decide whether the blend...... blend or batch. In the thesis it is shown how to link sampling result and acceptance criteria to the actual quality (homogeneity) of the blend or tablet batch. Also it is discussed how the assurance related to a specific acceptance criteria can be obtained from the corresponding OC-curve. Further...

  9. Statistical methods for handling incomplete data

    CERN Document Server

    Kim, Jae Kwang

    2013-01-01

    ""… this book nicely blends the theoretical material and its application through examples, and will be of interest to students and researchers as a textbook or a reference book. Extensive coverage of recent advances in handling missing data provides resources and guidelines for researchers and practitioners in implementing the methods in new settings. … I plan to use this as a textbook for my teaching and highly recommend it.""-Biometrics, September 2014

  10. Biometric Authentication using Nonparametric Methods

    CERN Document Server

    Sheela, S V; 10.5121/ijcsit.2010.2309

    2010-01-01

    The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based o...

  11. Biometric Authentication using Nonparametric Methods

    CERN Document Server

    Sheela, S V; 10.5121/ijcsit.2010.2309

    2010-01-01

    The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy k-means classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based ...

  12. A note on the use of the non-parametric Wilcoxon-Mann-Whitney test in the analysis of medical studies

    Directory of Open Access Journals (Sweden)

    Kühnast, Corinna

    2008-04-01

    Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.

  13. Estimation of the limit of detection with a bootstrap-derived standard error by a partly non-parametric approach. Application to HPLC drug assays

    DEFF Research Database (Denmark)

    Linnet, Kristian

    2005-01-01

    Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...

  14. Statistical Method of Estimating Nigerian Hydrocarbon Reserves

    Directory of Open Access Journals (Sweden)

    Jeffrey O. Oseh

    2015-01-01

    Full Text Available Hydrocarbon reserves are basic to planning and investment decisions in Petroleum Industry. Therefore its proper estimation is of considerable importance in oil and gas production. The estimation of hydrocarbon reserves in the Niger Delta Region of Nigeria has been very popular, and very successful, in the Nigerian oil and gas industry for the past 50 years. In order to fully estimate the hydrocarbon potentials in Nigerian Niger Delta Region, a clear understanding of the reserve geology and production history should be acknowledged. Reserves estimation of most fields is often performed through Material Balance and Volumetric methods. Alternatively a simple Estimation Model and Least Squares Regression may be useful or appropriate. This model is based on extrapolation of additional reserve due to exploratory drilling trend and the additional reserve factor which is due to revision of the existing fields. This Estimation model used alongside with Linear Regression Analysis in this study gives improved estimates of the fields considered, hence can be used in other Nigerian Fields with recent production history

  15. PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks.

    Directory of Open Access Journals (Sweden)

    Thong Pham

    Full Text Available Preferential attachment is a stochastic process that has been proposed to explain certain topological features characteristic of complex networks from diverse domains. The systematic investigation of preferential attachment is an important area of research in network science, not only for the theoretical matter of verifying whether this hypothesized process is operative in real-world networks, but also for the practical insights that follow from knowledge of its functional form. Here we describe a maximum likelihood based estimation method for the measurement of preferential attachment in temporal complex networks. We call the method PAFit, and implement it in an R package of the same name. PAFit constitutes an advance over previous methods primarily because we based it on a nonparametric statistical framework that enables attachment kernel estimation free of any assumptions about its functional form. We show this results in PAFit outperforming the popular methods of Jeong and Newman in Monte Carlo simulations. What is more, we found that the application of PAFit to a publically available Flickr social network dataset yielded clear evidence for a deviation of the attachment kernel from the popularly assumed log-linear form. Independent of our main work, we provide a correction to a consequential error in Newman's original method which had evidently gone unnoticed since its publication over a decade ago.

  16. Review of robust multivariate statistical methods in high dimension.

    Science.gov (United States)

    Filzmoser, Peter; Todorov, Valentin

    2011-10-31

    General ideas of robust statistics, and specifically robust statistical methods for calibration and dimension reduction are discussed. The emphasis is on analyzing high-dimensional data. The discussed methods are applied using the packages chemometrics and rrcov of the statistical software environment R. It is demonstrated how the functions can be applied to real high-dimensional data from chemometrics, and how the results can be interpreted.

  17. Scientific Method, Statistical Method and the Speed of Light

    OpenAIRE

    MacKay, R. J.; Oldford, R.W.

    2000-01-01

    What is “statistical method”? Is it the same as “scientific method”? This paper answers the first question by specifying the elements and procedures common to all statistical investigations and organizing these into a single structure. This structure is illustrated by careful examination of the first scientific study on the speed of light carried out by A. A. Michelson in 1879. Our answer to the second question is negative. To understand this a history on the speed of light ...

  18. An Overview of Short-term Statistical Forecasting Methods

    DEFF Research Database (Denmark)

    Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat

    2006-01-01

    An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...

  19. An Overview of Short-term Statistical Forecasting Methods

    DEFF Research Database (Denmark)

    Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat

    2006-01-01

    An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...

  20. Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics

    Science.gov (United States)

    Elliott, William; Choi, Eunhee; Friedline, Terri

    2013-01-01

    This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…

  1. Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics

    Science.gov (United States)

    Elliott, William; Choi, Eunhee; Friedline, Terri

    2013-01-01

    This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…

  2. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers

    Directory of Open Access Journals (Sweden)

    Stochl Jan

    2012-06-01

    Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis

  3. The use of Statistical Methods in Mechanical Engineering

    Directory of Open Access Journals (Sweden)

    Iram Saleem

    2013-03-01

    Full Text Available Statistics is an important tool to handle the vast data of present era as statistics can interpret all the information in such a beauty that so many conclusions can be extracted from it. The aim of this study is to see the use of statistical methods in Mechanical Engineering (ME therefore; we selected research papers published in 2010 from the well reputed journals in ME under Taylor and Francis Company LTD. More than 350 research papers were downloaded from well reputed ME journals such as Inverse Problem in Science and Engineering (IPSE, Machining Science and Technology (MST, Materials and Manufacturing Processes (MMP, Particulate Science and Technology (PST and Research in Nondestructive Evaluation (RNE. We recorded the statistical techniques/methods used in each research paper. In this study, we presented frequency distribution of descriptive statistics and advance level statistical methods used in five of the ME journals in 2010.

  4. Non-parametric approach to the study of phenotypic stability.

    Science.gov (United States)

    Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P

    2016-02-19

    The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.

  5. Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers.

    Science.gov (United States)

    Stochl, Jan; Jones, Peter B; Croudace, Tim J

    2012-06-11

    Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental

  6. Analyzing single-molecule time series via nonparametric Bayesian inference.

    Science.gov (United States)

    Hines, Keegan E; Bankston, John R; Aldrich, Richard W

    2015-02-03

    The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  7. Recent Developments in Applied Probability and Statistics

    CERN Document Server

    Devroye, Luc; Kohler, Michael; Korn, Ralf

    2010-01-01

    This book presents surveys on recent developments in applied probability and statistics. The contributions include topics such as nonparametric regression and density estimation, option pricing, probabilistic methods for multivariate interpolation, robust graphical modelling and stochastic differential equations. Due to its broad coverage of different topics the book offers an excellent overview of recent developments in applied probability and statistics.

  8. Statistical methods in longitudinal research principles and structuring change

    CERN Document Server

    von Eye, Alexander

    1991-01-01

    These edited volumes present new statistical methods in a way that bridges the gap between theoretical and applied statistics. The volumes cover general problems and issues and more specific topics concerning the structuring of change, the analysis of time series, and the analysis of categorical longitudinal data. The book targets students of development and change in a variety of fields - psychology, sociology, anthropology, education, medicine, psychiatry, economics, behavioural sciences, developmental psychology, ecology, plant physiology, and biometry - with basic training in statistics an

  9. A non-parametric method for automatic determination of P-wave and S-wave arrival times: application to local micro earthquakes

    Science.gov (United States)

    Rawles, Christopher; Thurber, Clifford

    2015-08-01

    We present a simple, fast, and robust method for automatic detection of P- and S-wave arrivals using a nearest neighbours-based approach. The nearest neighbour algorithm is one of the most popular time-series classification methods in the data mining community and has been applied to time-series problems in many different domains. Specifically, our method is based on the non-parametric time-series classification method developed by Nikolov. Instead of building a model by estimating parameters from the data, the method uses the data itself to define the model. Potential phase arrivals are identified based on their similarity to a set of reference data consisting of positive and negative sets, where the positive set contains examples of analyst identified P- or S-wave onsets and the negative set contains examples that do not contain P waves or S waves. Similarity is defined as the square of the Euclidean distance between vectors representing the scaled absolute values of the amplitudes of the observed signal and a given reference example in time windows of the same length. For both P waves and S waves, a single pass is done through the bandpassed data, producing a score function defined as the ratio of the sum of similarity to positive examples over the sum of similarity to negative examples for each window. A phase arrival is chosen as the centre position of the window that maximizes the score function. The method is tested on two local earthquake data sets, consisting of 98 known events from the Parkfield region in central California and 32 known events from the Alpine Fault region on the South Island of New Zealand. For P-wave picks, using a reference set containing two picks from the Parkfield data set, 98 per cent of Parkfield and 94 per cent of Alpine Fault picks are determined within 0.1 s of the analyst pick. For S-wave picks, 94 per cent and 91 per cent of picks are determined within 0.2 s of the analyst picks for the Parkfield and Alpine Fault data set

  10. The estimation of the measurement results with using statistical methods

    Science.gov (United States)

    Velychko, O.; Gordiyenko, T.

    2015-02-01

    The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.

  11. Evaluation of Nonparametric Probabilistic Forecasts of Wind Power

    DEFF Research Database (Denmark)

    Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;

    likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...

  12. Grade-Average Method: A Statistical Approach for Estimating ...

    African Journals Online (AJOL)

    Grade-Average Method: A Statistical Approach for Estimating Missing Value for Continuous Assessment Marks. ... Journal of the Nigerian Association of Mathematical Physics. Journal Home · ABOUT ... Open Access DOWNLOAD FULL TEXT ...

  13. Methods of quantum field theory in statistical physics

    CERN Document Server

    Abrikosov, A A; Gorkov, L P; Silverman, Richard A

    1975-01-01

    This comprehensive introduction to the many-body theory was written by three renowned physicists and acclaimed by American Scientist as ""a classic text on field theoretic methods in statistical physics."

  14. Steganalytic method based on short and repeated sequence distance statistics

    Institute of Scientific and Technical Information of China (English)

    WANG GuoXin; PING XiJian; XU ManKun; ZHANG Tao; BAO XiRui

    2008-01-01

    According to the distribution characteristics of short and repeated sequence (SRS),a steganalytic method based on the correlation of image bit planes is proposed.Firstly,we provide the conception of SRS distance statistics and deduce its statistical distribution.Because the SRS distance statistics can effectively reflect the correlation of the sequence,SRS has statistical features when the image bit plane sequence equals the image width.Using this characteristic,the steganalytic method is fulfilled by the distinct test of Poisson distribution.Experimental results show a good performance for detecting LSB matching steganographic method in still images.By the way,the proposed method is not designed for specific steganographic algorithms and has good generality.

  15. Complex Data Modeling and Computationally Intensive Statistical Methods

    CERN Document Server

    Mantovan, Pietro

    2010-01-01

    The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici

  16. Method for statistical data analysis of multivariate observations

    CERN Document Server

    Gnanadesikan, R

    1997-01-01

    A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte

  17. A Circular Statistical Method for Extracting Rotation Measures

    Indian Academy of Sciences (India)

    S. Sarala; Pankaj Jain

    2002-03-01

    We propose a new method for the extraction of Rotation Measures from spectral polarization data. The method is based on maximum likelihood analysis and takes into account the circular nature of the polarization data. The method is unbiased and statistically more efficient than the standard 2 procedure.

  18. Statistical Methods for Single-Particle Electron Cryomicroscopy

    DEFF Research Database (Denmark)

    Jensen, Katrine Hommelhoff

    from the noisy, randomly oriented projection images. Many statistical approaches to SPR have been proposed in the past. Typically, due to the computation time complexity, they rely on approximated maximum likelihood (ML) or maximum a posteriori (MAP) estimate of the structure. All methods presented...... between a MAP approach for estimating the protein structure. The resulting method is statistically optimal under the assumption of the uniform prior in the space of rotations. The marginal posterior is constructed by integrating over the view orientations and maximised by the expectation-maximisation (EM...... in this thesis attempt to solve a specific part of the reconstruction problem in a statistically sound manner. Firstly, we propose two methods for solving the problems (1) and (2). They can ultimately be extended and combined into a statistically sound solution to the full SPR problem. We use Bayesian...

  19. Analysis of Statistical Methods Currently used in Toxicology Journals.

    Science.gov (United States)

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  20. Statistical Methods for Detecting and Modeling General Patterns and Relationships in Lifetime Data

    Energy Technology Data Exchange (ETDEWEB)

    Kvaloey, Jan Terje

    1999-04-01

    In this thesis, the author tries to develop methods of detecting and modeling general patterns and relationships in lifetime data. Tests with power against nonmonotonic trends and nonmonotonic co variate effects are considered, and nonparametric regression methods which allow estimation of fairly general nonlinear relationships are studied. Practical uses of some of the methods are illustrated although in a medical rather than engineering or technological context.

  1. Oxygen Abundance Methods in SDSS: View from Modern Statistics

    Indian Academy of Sciences (India)

    Fei Shi; Gang Zhao; James Wicker

    2010-09-01

    Our purpose is to find which is the most reliable one among various oxygen abundance determination methods. We will test the validity of several different oxygen abundance determination methods using methods of modern statistics. These methods include Bayesian analysis and information scoring. We will analyze a sample of ∼ 6000 HII galaxies from the Sloan Digital Sky Survey (SDSS) spectroscopic observations data release four. All methods that we used drew the same conclusion that the method is a more reliable oxygen abundance determination method than the Bayesian metallicity method under the existing telescope ability. The ratios of the likelihoods between the different kinds of methods tell us that the , , and 32 methods are consistent with each other because the and 32 methods are calibrated by method. The Bayesian and 23 methods are consistent with each other because both are calibrated by a galaxy model. In either case, the 2 method is an unreliable method.

  2. Brief guidelines for methods and statistics in medical research

    CERN Document Server

    Ab Rahman, Jamalludin

    2015-01-01

    This book serves as a practical guide to methods and statistics in medical research. It includes step-by-step instructions on using SPSS software for statistical analysis, as well as relevant examples to help those readers who are new to research in health and medical fields. Simple texts and diagrams are provided to help explain the concepts covered, and print screens for the statistical steps and the SPSS outputs are provided, together with interpretations and examples of how to report on findings. Brief Guidelines for Methods and Statistics in Medical Research offers a valuable quick reference guide for healthcare students and practitioners conducting research in health related fields, written in an accessible style.

  3. Statistical Methods for Characterizing Variability in Stellar Spectra

    Science.gov (United States)

    Cisewski, Jessi; Yale Astrostatistics

    2017-01-01

    Recent years have seen a proliferation in the number of exoplanets discovered. One technique for uncovering exoplanets relies on the detection of subtle shifts in the stellar spectra due to the Doppler effect caused by an orbiting object. However, stellar activity can cause distortions in the spectra that mimic the imprint of an orbiting exoplanet. The collection of stellar spectra potentially contains more information than is traditionally used for estimating its radial velocity curve. I will discuss some statistical methods that can be used for characterizing the sources of variability in the spectra. Statistical assessment of stellar spectra is a focus of the Statistical and Applied Mathematical Sciences Institute (SAMSI)'s yearlong program on Statistical, Mathematical and Computational Methods for Astronomy's Working Group IV (Astrophysical Populations).

  4. Fundamentals of modern statistical methods substantially improving power and accuracy

    CERN Document Server

    Wilcox, Rand R

    2001-01-01

    Conventional statistical methods have a very serious flaw They routinely miss differences among groups or associations among variables that are detected by more modern techniques - even under very small departures from normality Hundreds of journal articles have described the reasons standard techniques can be unsatisfactory, but simple, intuitive explanations are generally unavailable Improved methods have been derived, but they are far from obvious or intuitive based on the training most researchers receive Situations arise where even highly nonsignificant results become significant when analyzed with more modern methods Without assuming any prior training in statistics, Part I of this book describes basic statistical principles from a point of view that makes their shortcomings intuitive and easy to understand The emphasis is on verbal and graphical descriptions of concepts Part II describes modern methods that address the problems covered in Part I Using data from actual studies, many examples are include...

  5. Complexity of software trustworthiness and its dynamical statistical analysis methods

    Institute of Scientific and Technical Information of China (English)

    ZHENG ZhiMing; MA ShiLong; LI Wei; JIANG Xin; WEI Wei; MA LiLi; TANG ShaoTing

    2009-01-01

    Developing trusted softwares has become an important trend and a natural choice in the development of software technology and applications.At present,the method of measurement and assessment of software trustworthiness cannot guarantee safe and reliable operations of software systems completely and effectively.Based on the dynamical system study,this paper interprets the characteristics of behaviors of software systems and the basic scientific problems of software trustworthiness complexity,analyzes the characteristics of complexity of software trustworthiness,and proposes to study the software trustworthiness measurement in terms of the complexity of software trustworthiness.Using the dynamical statistical analysis methods,the paper advances an invariant-measure based assessment method of software trustworthiness by statistical indices,and hereby provides a dynamical criterion for the untrustworthiness of software systems.By an example,the feasibility of the proposed dynamical statistical analysis method in software trustworthiness measurement is demonstrated using numerical simulations and theoretical analysis.

  6. Statistical Methods for Quantitatively Detecting Fungal Disease from Fruits’ Images

    OpenAIRE

    Jagadeesh D. Pujari; Yakkundimath, Rajesh Siddaramayya; Byadgi, Abdulmunaf Syedhusain

    2013-01-01

    In this paper we have proposed statistical methods for detecting fungal disease and classifying based on disease severity levels.  Most fruits diseases are caused by bacteria, fungi, virus, etc of which fungi are responsible for a large number of diseases in fruits. In this study images of fruits, affected by different fungal symptoms are collected and categorized based on disease severity. Statistical features like block wise, gray level co-occurrence matrix (GLCM), gray level runlength matr...

  7. Nonparametric Bayesian Modeling for Automated Database Schema Matching

    Energy Technology Data Exchange (ETDEWEB)

    Ferragut, Erik M [ORNL; Laska, Jason A [ORNL

    2015-01-01

    The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.

  8. Hierarchical modelling for the environmental sciences statistical methods and applications

    CERN Document Server

    Clark, James S

    2006-01-01

    New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.

  9. The Metropolis Monte Carlo Method in Statistical Physics

    Science.gov (United States)

    Landau, David P.

    2003-11-01

    A brief overview is given of some of the advances in statistical physics that have been made using the Metropolis Monte Carlo method. By complementing theory and experiment, these have increased our understanding of phase transitions and other phenomena in condensed matter systems. A brief description of a new method, commonly known as "Wang-Landau sampling," will also be presented.

  10. Descriptive and inferential statistical methods used in burns research.

    Science.gov (United States)

    Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars

    2010-05-01

    Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals

  11. 基于非参数与L-Moment估计的股市动态极值ES风险测度研究%Measuring Dynamic Extreme ES Risk for Stock Markets Based on Nonparametric and L-Moment Method

    Institute of Scientific and Technical Information of China (English)

    林宇; 谭斌; 黄登仕; 魏宇

    2011-01-01

    This paper applies bandwidth nonparametric method and AR-GARCH to model the conditional mean and conditional volatility for estimating the standardized residuals of conditional returns, and then, L-Moment and MLE are used to estimate parameters of GPD, and estimate dynamic VaR and ES risk. Finally, this paper applies Back-Testing to test the accuracy of VaR and ES measurement model. Our results show that the nonparametric estimation seems superior to GARCH model in accuracy of risk measurement, and that the risk measurement model based on nonparametric estimation and L-moment method can effectively measure dynamic risks of shanghai and Shenzhen stock markets.%通过运用带宽非参数方法、AR-GARCH模型对时间序列的条件均值、条件波动性进行建模估计出标准残差序列,再运用L-Moment与MLE(maximum Likelihood estimation)估计标准残差的尾部的GPD参数,进而运用实验方法测度出风险VaR(value at Risk)及ES(Expected Shortfall),最后运用Back-Testing方法检验测度准确性.结果表明,基于带宽的非参数估计模型比GARCH簇模型在测度ES上具有更高的可靠性:基于非参数模型与L-Moment的风险测度模型能够有效测度沪深股市的动态VaR与ES.

  12. Academic Training Lecture: Statistical Methods for Particle Physics

    CERN Multimedia

    PH Department

    2012-01-01

    2, 3, 4 and 5 April 2012 Academic Training Lecture  Regular Programme from 11:00 to 12:00 -  Bldg. 222-R-001 - Filtration Plant Statistical Methods for Particle Physics by Glen Cowan (Royal Holloway) The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena.  Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties.  The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

  13. Three Methods for Occupation Coding Based on Statistical Learning

    Directory of Open Access Journals (Sweden)

    Gweon Hyukjun

    2017-03-01

    Full Text Available Occupation coding, an important task in official statistics, refers to coding a respondent’s text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS, we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining is preferable to one based on exact string matches.

  14. Glaucoma Monitoring in a Clinical Setting Glaucoma Progression Analysis vs Nonparametric Progression Analysis in the Groningen Longitudinal Glaucoma Study

    NARCIS (Netherlands)

    Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.

    Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:

  15. Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

    Science.gov (United States)

    Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

    2016-03-01

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.

  16. Statistical concepts a second course

    CERN Document Server

    Lomax, Richard G

    2012-01-01

    Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes

  17. Statistical Methods for Particle Physics (4/4)

    CERN Document Server

    CERN. Geneva

    2012-01-01

    The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

  18. Statistical Methods for Particle Physics (2/4)

    CERN Document Server

    CERN. Geneva

    2012-01-01

    The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

  19. Statistical Methods for Particle Physics (1/4)

    CERN Document Server

    CERN. Geneva

    2012-01-01

    The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

  20. Statistical Methods for Particle Physics (3/4)

    CERN Document Server

    CERN. Geneva

    2012-01-01

    The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

  1. Understanding common statistical methods, Part I: descriptive methods, probability, and continuous data.

    Science.gov (United States)

    Skinner, Carl G; Patel, Manish M; Thomas, Jerry D; Miller, Michael A

    2011-01-01

    Statistical methods are pervasive in medical research and general medical literature. Understanding general statistical concepts will enhance our ability to critically appraise the current literature and ultimately improve the delivery of patient care. This article intends to provide an overview of the common statistical methods relevant to medicine.

  2. Non-Parametric Estimation of Correlation Functions

    DEFF Research Database (Denmark)

    Brincker, Rune; Rytter, Anders; Krenk, Steen

    In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....

  3. A novel statistical method for classifying habitat generalists and specialists

    DEFF Research Database (Denmark)

    Chazdon, Robin L; Chao, Anne; Colwell, Robert K

    2011-01-01

    We develop a novel statistical approach for classifying generalists and specialists in two distinct habitats. Using a multinomial model based on estimated species relative abundance in two habitats, our method minimizes bias due to differences in sampling intensities between two habitat types...... as well as bias due to insufficient sampling within each habitat. The method permits a robust statistical classification of habitat specialists and generalists, without excluding rare species a priori. Based on a user-defined specialization threshold, the model classifies species into one of four groups...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...

  4. Urban Fire Risk Clustering Method Based on Fire Statistics

    Institute of Scientific and Technical Information of China (English)

    WU Lizhi; REN Aizhu

    2008-01-01

    Fire statistics and fire analysis have become important ways for us to understand the law of fire,prevent the occurrence of fire, and improve the ability to control fire. According to existing fire statistics, the weighted fire risk calculating method characterized by the number of fire occurrence, direct economic losses,and fire casualties was put forward. On the basis of this method, meanwhile having improved K-mean clus-tering arithmetic, this paper established fire dsk K-mean clustering model, which could better resolve the automatic classifying problems towards fire risk. Fire risk cluster should be classified by the absolute dis-tance of the target instead of the relative distance in the traditional cluster arithmetic. Finally, for applying the established model, this paper carded out fire risk clustering on fire statistics from January 2000 to December 2004 of Shenyang in China. This research would provide technical support for urban fire management.

  5. Statistical methods with applications to demography and life insurance

    CERN Document Server

    Khmaladze, Estáte V

    2013-01-01

    Suitable for statisticians, mathematicians, actuaries, and students interested in the problems of insurance and analysis of lifetimes, Statistical Methods with Applications to Demography and Life Insurance presents contemporary statistical techniques for analyzing life distributions and life insurance problems. It not only contains traditional material but also incorporates new problems and techniques not discussed in existing actuarial literature. The book mainly focuses on the analysis of an individual life and describes statistical methods based on empirical and related processes. Coverage ranges from analyzing the tails of distributions of lifetimes to modeling population dynamics with migrations. To help readers understand the technical points, the text covers topics such as the Stieltjes, Wiener, and Itô integrals. It also introduces other themes of interest in demography, including mixtures of distributions, analysis of longevity and extreme value theory, and the age structure of a population. In addi...

  6. On the statistical analysis of trend in tropospheric ozone levels

    Science.gov (United States)

    Vaquera-Huerta, Humberto

    1997-11-01

    This paper is a study of methodology to investigate trends in ozone levels in urban areas. Three methods are studied; two parametric (NHPP, and GPD) and one nonparametric. The nonhomogeneous Poisson process (NHPP) approach: This method is based on the idea that the number of exceedances over a high threshold follows a Poisson distribution. In this method the detection of trend is approached by estimating the intensity function of the process. The intensity function is estimated using parametric and nonparametric methods. A general parametric function over time for the rate of a NHPP is proposed in order to test for nonexponential patterns. The Generalized Pareto Distribution approach (GPD): In this method the detection of trend in ozone is approached by considering that the magnitude of the exceedances over a high threshold follows a generalized Pareto distribution. The nonparametric statistical approach: A test for trend and a nonparametric estimator of a trend parameter were studied. The asymptotic distribution of the estimator is also provided. The nonparametric estimator is compared with the least squares estimator. The three methods are studied empirically using a Monte Carlo method. Some insight into how well these methods perform is obtained. Also, the use of each method is illustrated by examples with ozone data. The results of this study show that NHPP performs very well as a method to detect ozone trends. The GPD and the nonparametric approaches had low power for detecting trends in simulation experiments.

  7. Landslide Susceptibility Statistical Methods: A Critical and Systematic Literature Review

    Science.gov (United States)

    Mihir, Monika; Malamud, Bruce; Rossi, Mauro; Reichenbach, Paola; Ardizzone, Francesca

    2014-05-01

    Landslide susceptibility assessment, the subject of this systematic review, is aimed at understanding the spatial probability of slope failures under a set of geomorphological and environmental conditions. It is estimated that about 375 landslides that occur globally each year are fatal, with around 4600 people killed per year. Past studies have brought out the increasing cost of landslide damages which primarily can be attributed to human occupation and increased human activities in the vulnerable environments. Many scientists, to evaluate and reduce landslide risk, have made an effort to efficiently map landslide susceptibility using different statistical methods. In this paper, we do a critical and systematic landslide susceptibility literature review, in terms of the different statistical methods used. For each of a broad set of studies reviewed we note: (i) study geography region and areal extent, (ii) landslide types, (iii) inventory type and temporal period covered, (iv) mapping technique (v) thematic variables used (vi) statistical models, (vii) assessment of model skill, (viii) uncertainty assessment methods, (ix) validation methods. We then pulled out broad trends within our review of landslide susceptibility, particularly regarding the statistical methods. We found that the most common statistical methods used in the study of landslide susceptibility include logistic regression, artificial neural network, discriminant analysis and weight of evidence. Although most of the studies we reviewed assessed the model skill, very few assessed model uncertainty. In terms of geographic extent, the largest number of landslide susceptibility zonations were in Turkey, Korea, Spain, Italy and Malaysia. However, there are also many landslides and fatalities in other localities, particularly India, China, Philippines, Nepal and Indonesia, Guatemala, and Pakistan, where there are much fewer landslide susceptibility studies available in the peer-review literature. This

  8. Investigating salt frost scaling by using statistical methods

    DEFF Research Database (Denmark)

    Hasholt, Marianne Tange; Clemmensen, Line Katrine Harder

    2010-01-01

    A large data set comprising data for 118 concrete mixes on mix design, air void structure, and the outcome of freeze/thaw testing according to SS 13 72 44 has been analysed by use of statistical methods. The results show that with regard to mix composition, the most important parameter...

  9. Statistical methods for cosmological parameter selection and estimation

    CERN Document Server

    Liddle, Andrew R

    2009-01-01

    The estimation of cosmological parameters from precision observables is an important industry with crucial ramifications for particle physics. This article discusses the statistical methods presently used in cosmological data analysis, highlighting the main assumptions and uncertainties. The topics covered are parameter estimation, model selection, multi-model inference, and experimental design, all primarily from a Bayesian perspective.

  10. Kansas's forests, 2005: statistics, methods, and quality assurance

    Science.gov (United States)

    Patrick D. Miles; W. Keith Moser; Charles J. Barnett

    2011-01-01

    The first full annual inventory of Kansas's forests was completed in 2005 after 8,868 plots were selected and 468 forested plots were visited and measured. This report includes detailed information on forest inventory methods and data quality estimates. Important resource statistics are included in the tables. A detailed analysis of Kansas inventory is presented...

  11. Optimization of statistical methods impact on quantitative proteomics data

    NARCIS (Netherlands)

    Pursiheimo, A.; Vehmas, A.P.; Afzal, S.; Suomi, T.; Chand, T.; Strauss, L.; Poutanen, M.; Rokka, A.; Corthals, G.L.; Elo, L.L.

    2015-01-01

    As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled

  12. Application of statistical methods at copper wire manufacturing

    Directory of Open Access Journals (Sweden)

    Z. Hajduová

    2009-01-01

    Full Text Available Six Sigma is a method of management that strives for near perfection. The Six Sigma methodology uses data and rigorous statistical analysis to identify defects in a process or product, reduce variability and achieve as close to zero defects as possible. The paper presents the basic information on this methodology.

  13. Peer-Assisted Learning in Research Methods and Statistics

    Science.gov (United States)

    Stone, Anna; Meade, Claire; Watling, Rosamond

    2012-01-01

    Feedback from students on a Level 1 Research Methods and Statistics module, studied as a core part of a BSc Psychology programme, highlighted demand for additional tutorials to help them to understand basic concepts. Students in their final year of study commonly request work experience to enhance their employability. All students on the Level 1…

  14. Investigating salt frost scaling by using statistical methods

    DEFF Research Database (Denmark)

    Hasholt, Marianne Tange; Clemmensen, Line Katrine Harder

    2010-01-01

    A large data set comprising data for 118 concrete mixes on mix design, air void structure, and the outcome of freeze/thaw testing according to SS 13 72 44 has been analysed by use of statistical methods. The results show that with regard to mix composition, the most important parameter is the equ...

  15. Statistical process control methods for expert system performance monitoring.

    Science.gov (United States)

    Kahn, M G; Bailey, T C; Steib, S A; Fraser, V J; Dunagan, W C

    1996-01-01

    The literature on the performance evaluation of medical expert system is extensive, yet most of the techniques used in the early stages of system development are inappropriate for deployed expert systems. Because extensive clinical and informatics expertise and resources are required to perform evaluations, efficient yet effective methods of monitoring performance during the long-term maintenance phase of the expert system life cycle must be devised. Statistical process control techniques provide a well-established methodology that can be used to define policies and procedures for continuous, concurrent performance evaluation. Although the field of statistical process control has been developed for monitoring industrial processes, its tools, techniques, and theory are easily transferred to the evaluation of expert systems. Statistical process tools provide convenient visual methods and heuristic guidelines for detecting meaningful changes in expert system performance. The underlying statistical theory provides estimates of the detection capabilities of alternative evaluation strategies. This paper describes a set of statistical process control tools that can be used to monitor the performance of a number of deployed medical expert systems. It describes how p-charts are used in practice to monitor the GermWatcher expert system. The case volume and error rate of GermWatcher are then used to demonstrate how different inspection strategies would perform.

  16. Nonparametric predictive inference for combining diagnostic tests with parametric copula

    Science.gov (United States)

    Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

    2017-09-01

    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.

  17. Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders

    DEFF Research Database (Denmark)

    Nielsen, Morten Ørregaard

    2009-01-01

    In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...

  18. Recent development on statistical methods for personalized medicine discovery.

    Science.gov (United States)

    Zhao, Yingqi; Zeng, Donglin

    2013-03-01

    It is well documented that patients can show significant heterogeneous responses to treatments so the best treatment strategies may require adaptation over individuals and time. Recently, a number of new statistical methods have been developed to tackle the important problem of estimating personalized treatment rules using single-stage or multiple-stage clinical data. In this paper, we provide an overview of these methods and list a number of challenges.

  19. A new statistical method for mapping QTLs underlying endosperm traits

    Institute of Scientific and Technical Information of China (English)

    HU Zhiqiu; XU Chenwu

    2005-01-01

    Genetic expression for an endosperm trait in seeds of cereal crops may be controlled simultaneously by the triploid endosperm genotypes and the diploid maternal genotypes. However, current statistical methods for mapping quantitative trait loci (QTLs) underlying endosperm traits have not been effective in dealing with the putative maternal genetic effects. Combining the quantitative genetic model for diploid maternal traits with triploid endosperm traits, here we propose a new statistical method for mapping QTLs controlling endosperm traits with maternal genetic effects. This method applies the data set of both DNA molecular marker genotypes of each plant in segregation population and the quantitative observations of single endosperms in each plant to map QTL. The maximum likelihood method implemented via the expectation-maximization algorithm was used to the estimate parameters of a putative QTL. Since this method involves the maternal effect that may contribute to endosperm traits, it might be more congruent with the genetics of endosperm traits and more helpful to increasing the precision of QTL mapping. The simulation results show the proposed method provides accurate estimates of the QTL effects and locations with high statistical power.

  20. Using statistical methods of quality management in logistics processes

    Directory of Open Access Journals (Sweden)

    Tkachenko Alla

    2016-04-01

    Full Text Available The purpose of the paper is to study the application of statistical methods of logistics process quality management at a large industrial enterprise and testing the theoretical studies. The analysis of the publications shows that a significant number of works by both Ukrainian and foreign authors has been dedicated to the research of quality management, while statistical methods of quality management have only been thoroughly analyzed by a small number of researchers, since these methods are referred to as classical, that is, those that are considered well-known and do not require special attention of modern scholars. In the authors’ opinion, the logistics process is a process of transformation and movement of material and accompanying flows by ensuring management freedom under the conditions of sequential interdependencies; standardization; synchronization; sharing information, and consistency of incentives, using innovative methods and models. In our study, we have shown that the management of logistics processes should use such statistical methods of quality management as descriptive statistics, experiment planning, hypotheses testing, measurement analysis, process opportunities analysis, regression analysis, reliability analysis, sampling, modeling, maps of statistical process control, specification of statistical tolerance, time series analysis. The proposed statistical methods of logistics processes quality management have been tested at the large industrial enterprise JSC "Dniepropetrovsk Aggregate Plant" that specializes in manufacturing hydraulic control valves. The findings suggest that the main purpose in the sphere of logistics processes quality is the continuous improvement of the mining equipment production quality through the use of innovative processes, advanced management systems and information technology. This will enable the enterprise to meet the requirements and expectations of their customers. It has been proved that the

  1. Statistical Properties of Fluctuations: A Method to Check Market Behavior

    CERN Document Server

    Panigrahi, Prasanta K; Manimaran, P; Ahalpara, Dilip P

    2009-01-01

    We analyze the Bombay stock exchange (BSE) price index over the period of last 12 years. Keeping in mind the large fluctuations in last few years, we carefully find out the transient, non-statistical and locally structured variations. For that purpose, we make use of Daubechies wavelet and characterize the fractal behavior of the returns using a recently developed wavelet based fluctuation analysis method. the returns show a fat-tail distribution as also weak non-statistical behavior. We have also carried out continuous wavelet as well as Fourier power spectral analysis to characterize the periodic nature and correlation properties of the time series.

  2. System and method for statistically monitoring and analyzing sensed conditions

    Science.gov (United States)

    Pebay, Philippe P.; Brandt, James M. , Gentile; Ann C. , Marzouk; Youssef M. , Hale; Darrian J. , Thompson; David C.

    2010-07-13

    A system and method of monitoring and analyzing a plurality of attributes for an alarm condition is disclosed. The attributes are processed and/or unprocessed values of sensed conditions of a collection of a statistically significant number of statistically similar components subjected to varying environmental conditions. The attribute values are used to compute the normal behaviors of some of the attributes and also used to infer parameters of a set of models. Relative probabilities of some attribute values are then computed and used along with the set of models to determine whether an alarm condition is met. The alarm conditions are used to prevent or reduce the impact of impending failure.

  3. From Microphysics to Macrophysics Methods and Applications of Statistical Physics

    CERN Document Server

    Balian, Roger

    2007-01-01

    This text not only provides a thorough introduction to statistical physics and thermodynamics but also exhibits the universality of the chain of ideas that leads from the laws of microphysics to the macroscopic behaviour of matter. A wide range of applications teaches students how to make use of the concepts, and many exercises will help to deepen their understanding. Drawing on both quantum mechanics and classical physics, the book follows modern research in statistical physics. Volume I discusses in detail the probabilistic description of quantum or classical systems, the Boltzmann-Gibbs distributions, the conservation laws, and the interpretation of entropy as missing information. Thermodynamics and electromagnetism in matter are dealt with, as well as applications to gases, both dilute and condensed, and to phase transitions. Volume II applies statistical methods to systems governed by quantum effects, in particular to solid state physics, explaining properties due to the crystal structure or to the latti...

  4. Applied statistical methods in agriculture, health and life sciences

    CERN Document Server

    Lawal, Bayo

    2014-01-01

    This textbook teaches crucial statistical methods to answer research questions using a unique range of statistical software programs, including MINITAB and R. This textbook is developed for undergraduate students in agriculture, nursing, biology and biomedical research. Graduate students will also find it to be a useful way to refresh their statistics skills and to reference software options. The unique combination of examples is approached using MINITAB and R for their individual strengths. Subjects covered include among others data description, probability distributions, experimental design, regression analysis, randomized design and biological assay. Unlike other biostatistics textbooks, this text also includes outliers, influential observations in regression and an introduction to survival analysis. Material is taken from the author's extensive teaching and research in Africa, USA and the UK. Sample problems, references and electronic supplementary material accompany each chapter.

  5. Predicting recreational water quality advisories: A comparison of statistical methods

    Science.gov (United States)

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  6. Statistical disclosure control for microdata methods and applications in R

    CERN Document Server

    Templ, Matthias

    2017-01-01

    This book on statistical disclosure control presents the theory, applications and software implementation of the traditional approach to (micro)data anonymization, including data perturbation methods, disclosure risk, data utility, information loss and methods for simulating synthetic data. Introducing readers to the R packages sdcMicro and simPop, the book also features numerous examples and exercises with solutions, as well as case studies with real-world data, accompanied by the underlying R code to allow readers to reproduce all results. The demand for and volume of data from surveys, registers or other sources containing sensible information on persons or enterprises have increased significantly over the last several years. At the same time, privacy protection principles and regulations have imposed restrictions on the access and use of individual data. Proper and secure microdata dissemination calls for the application of statistical disclosure control methods to the data before release. This book is in...

  7. Mathematical statistics

    CERN Document Server

    Pestman, Wiebe R

    2009-01-01

    This textbook provides a broad and solid introduction to mathematical statistics, including the classical subjects hypothesis testing, normal regression analysis, and normal analysis of variance. In addition, non-parametric statistics and vectorial statistics are considered, as well as applications of stochastic analysis in modern statistics, e.g., Kolmogorov-Smirnov testing, smoothing techniques, robustness and density estimation. For students with some elementary mathematical background. With many exercises. Prerequisites from measure theory and linear algebra are presented.

  8. Alternative statistical methods for cytogenetic radiation biological dosimetry

    CERN Document Server

    Fornalski, Krzysztof Wojciech

    2014-01-01

    The paper presents alternative statistical methods for biological dosimetry, such as the Bayesian and Monte Carlo method. The classical Gaussian and robust Bayesian fit algorithms for the linear, linear-quadratic as well as saturated and critical calibration curves are described. The Bayesian model selection algorithm for those curves is also presented. In addition, five methods of dose estimation for a mixed neutron and gamma irradiation field were described: two classical methods, two Bayesian methods and one Monte Carlo method. Bayesian methods were also enhanced and generalized for situations with many types of mixed radiation. All algorithms were presented in easy-to-use form, which can be applied to any computational programming language. The presented algorithm is universal, although it was originally dedicated to cytogenetic biological dosimetry of victims of a nuclear reactor accident.

  9. Non-Parametric Inference in Astrophysics

    CERN Document Server

    Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA

    2001-01-01

    We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.

  10. Transit Timing Observations From Kepler: Ii. Confirmation of Two Multiplanet Systems via a Non-Parametric Correlation Analysis

    OpenAIRE

    Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.

    2012-01-01

    We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...

  11. Statistical methods for assessing agreement between continuous measurements

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Hansen, Rikke Pilegaard; Vedsted, Peter

    ), concordance coefficient, Bland-Altman limits of agreement and percentage of agreement to assess the agreement between patient reported delay and doctor reported delay in diagnosis of cancer in general practice. Key messages: The correct statistical approach is not obvious. Many studies give the product......-moment correlation coefficient (r) between the results of the two measurements methods as an indicator of agreement, which is wrong. There have been proposed several alternative methods, which we will describe together with preconditions for use of the methods....

  12. Statistical methods of SNP data analysis with applications

    CERN Document Server

    Bulinski, Alexander; Shashkin, Alexey; Yaskov, Pavel

    2011-01-01

    Various statistical methods important for genetic analysis are considered and developed. Namely, we concentrate on the multifactor dimensionality reduction, logic regression, random forests and stochastic gradient boosting. These methods and their new modifications, e.g., the MDR method with "independent rule", are used to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and external risk factors are examined. To perform the data analysis concerning the ischemic heart disease and myocardial infarction the supercomputer SKIF "Chebyshev" of the Lomonosov Moscow State University was employed.

  13. Identifying Reflectors in Seismic Images via Statistic and Syntactic Methods

    Directory of Open Access Journals (Sweden)

    Carlos A. Perez

    2010-04-01

    Full Text Available In geologic interpretation of seismic reflection data, accurate identification of reflectors is the foremost step to ensure proper subsurface structural definition. Reflector information, along with other data sets, is a key factor to predict the presence of hydrocarbons. In this work, mathematic and pattern recognition theory was adapted to design two statistical and two syntactic algorithms which constitute a tool in semiautomatic reflector identification. The interpretive power of these four schemes was evaluated in terms of prediction accuracy and computational speed. Among these, the semblance method was confirmed to render the greatest accuracy and speed. Syntactic methods offer an interesting alternative due to their inherently structural search method.

  14. A sequential nonparametric pattern classification algorithm based on the Wald SPRT. [Sequential Probability Ratio Test

    Science.gov (United States)

    Poage, J. L.

    1975-01-01

    A sequential nonparametric pattern classification procedure is presented. The method presented is an estimated version of the Wald sequential probability ratio test (SPRT). This method utilizes density function estimates, and the density estimate used is discussed, including a proof of convergence in probability of the estimate to the true density function. The classification procedure proposed makes use of the theory of order statistics, and estimates of the probabilities of misclassification are given. The procedure was tested on discriminating between two classes of Gaussian samples and on discriminating between two kinds of electroencephalogram (EEG) responses.

  15. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

    KAUST Repository

    Dai, Wenlin

    2017-09-01

    Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

  16. Applied systems ecology: models, data, and statistical methods

    Energy Technology Data Exchange (ETDEWEB)

    Eberhardt, L L

    1976-01-01

    In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.

  17. Statistical methods for detecting differentially methylated loci and regions

    Directory of Open Access Journals (Sweden)

    Mark D Robinson

    2014-09-01

    Full Text Available DNA methylation, the reversible addition of methyl groups at CpG dinucleotides, represents an important regulatory layer associated with gene expression. Changed methylation status has been noted across diverse pathological states, including cancer. The rapid development and uptake of microarrays and large scale DNA sequencing has prompted an explosion of data analytic methods for processing and discovering changes in DNA methylation across varied data types. In this mini-review, we present a compact and accessible discussion of many of the salient challenges, such as experimental design, statistical methods for differential methylation detection, critical considerations such as cell type composition and the potential confounding that can arise from batch effects. From a statistical perspective, our main interests include the use of empirical Bayes or hierarchical models, which have proved immensely powerful in genomics, and the procedures by which false discovery control is achieved.

  18. An Alternating Iterative Method and Its Application in Statistical Inference

    Institute of Scientific and Technical Information of China (English)

    Ning Zhong SHI; Guo Rong HU; Qing CUI

    2008-01-01

    This paper studies non-convex programming problems. It is known that, in statistical inference, many constrained estimation problems may be expressed as convex programming problems. However, in many practical problems, the objective functions are not convex. In this paper, we give a definition of a semi-convex objective function and discuss the corresponding non-convex programming problems. A two-step iterative algorithm called the alternating iterative method is proposed for finding solutions for such problems. The method is illustrated by three examples in constrained estimation problems given in Sasabuchi et al. (Biometrika, 72, 465–472 (1983)), Shi N. Z. (J. Multivariate Anal.,50, 282–293 (1994)) and El Barmi H. and Dykstra R. (Ann. Statist., 26, 1878–1893 (1998)).

  19. New Graphical Methods and Test Statistics for Testing Composite Normality

    Directory of Open Access Journals (Sweden)

    Marc S. Paolella

    2015-07-01

    Full Text Available Several graphical methods for testing univariate composite normality from an i.i.d. sample are presented. They are endowed with correct simultaneous error bounds and yield size-correct tests. As all are based on the empirical CDF, they are also consistent for all alternatives. For one test, called the modified stabilized probability test, or MSP, a highly simplified computational method is derived, which delivers the test statistic and also a highly accurate p-value approximation, essentially instantaneously. The MSP test is demonstrated to have higher power against asymmetric alternatives than the well-known and powerful Jarque-Bera test. A further size-correct test, based on combining two test statistics, is shown to have yet higher power. The methodology employed is fully general and can be applied to any i.i.d. univariate continuous distribution setting.

  20. Multivariate Statistical Process Control Process Monitoring Methods and Applications

    CERN Document Server

    Ge, Zhiqiang

    2013-01-01

      Given their key position in the process control industry, process monitoring techniques have been extensively investigated by industrial practitioners and academic control researchers. Multivariate statistical process control (MSPC) is one of the most popular data-based methods for process monitoring and is widely used in various industrial areas. Effective routines for process monitoring can help operators run industrial processes efficiently at the same time as maintaining high product quality. Multivariate Statistical Process Control reviews the developments and improvements that have been made to MSPC over the last decade, and goes on to propose a series of new MSPC-based approaches for complex process monitoring. These new methods are demonstrated in several case studies from the chemical, biological, and semiconductor industrial areas.   Control and process engineers, and academic researchers in the process monitoring, process control and fault detection and isolation (FDI) disciplines will be inter...

  1. Multivariate methods and forecasting with IBM SPSS statistics

    CERN Document Server

    Aljandali, Abdulkader

    2017-01-01

    This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...

  2. Statistical Methods for Thermonuclear Reaction Rates and Nucleosynthesis Simulations

    CERN Document Server

    Iliadis, Christian; Coc, Alain; Timmes, F X; Champagne, Art E

    2014-01-01

    Rigorous statistical methods for estimating thermonuclear reaction rates and nucleosynthesis are becoming increasingly established in nuclear astrophysics. The main challenge being faced is that experimental reaction rates are highly complex quantities derived from a multitude of different measured nuclear parameters (e.g., astrophysical S-factors, resonance energies and strengths, particle and gamma-ray partial widths). We discuss the application of the Monte Carlo method to two distinct, but related, questions. First, given a set of measured nuclear parameters, how can one best estimate the resulting thermonuclear reaction rates and associated uncertainties? Second, given a set of appropriate reaction rates, how can one best estimate the abundances from nucleosynthesis (i.e., reaction network) calculations? The techniques described here provide probability density functions that can be used to derive statistically meaningful reaction rates and final abundances for any desired coverage probability. Examples ...

  3. A novel non-parametric method for uncertainty evaluation of correlation-based molecular signatures: its application on PAM50 algorithm.

    Science.gov (United States)

    Fresno, Cristóbal; González, Germán Alexis; Merino, Gabriela Alejandra; Flesia, Ana Georgina; Podhajcer, Osvaldo Luis; Llera, Andrea Sabina; Fernández, Elmer Andrés

    2017-03-01

    The PAM50 classifier is used to assign patients to the highest correlated breast cancer subtype irrespectively of the obtained value. Nonetheless, all subtype correlations are required to build the risk of recurrence (ROR) score, currently used in therapeutic decisions. Present subtype uncertainty estimations are not accurate, seldom considered or require a population-based approach for this context. Here we present a novel single-subject non-parametric uncertainty estimation based on PAM50's gene label permutations. Simulations results ( n  = 5228) showed that only 61% subjects can be reliably 'Assigned' to the PAM50 subtype, whereas 33% should be 'Not Assigned' (NA), leaving the rest to tight 'Ambiguous' correlations between subtypes. The NA subjects exclusion from the analysis improved survival subtype curves discrimination yielding a higher proportion of low and high ROR values. Conversely, all NA subjects showed similar survival behaviour regardless of the original PAM50 assignment. We propose to incorporate our PAM50 uncertainty estimation to support therapeutic decisions. Source code can be found in 'pbcmc' R package at Bioconductor. cristobalfresno@gmail.com or efernandez@bdmg.com.ar. Supplementary data are available at Bioinformatics online.

  4. Lottery spending: a non-parametric analysis.

    Science.gov (United States)

    Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody

    2015-01-01

    We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  5. Lottery spending: a non-parametric analysis.

    Directory of Open Access Journals (Sweden)

    Skip Garibaldi

    Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.

  6. Nonparametric inferences for kurtosis and conditional kurtosis

    Institute of Scientific and Technical Information of China (English)

    XIE Xiao-heng; HE You-hua

    2009-01-01

    Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.

  7. Statistical methods for longitudinal data with agricultural applications

    DEFF Research Database (Denmark)

    Anantharama Ankinakatte, Smitha

    The PhD study focuses on modeling two kings of longitudinal data arising in agricultural applications: continuous time series data and discrete longitudinal data. Firstly, two statistical methods, neural networks and generalized additive models, are applied to predict masistis using multivariate...... algorithm. This was found to compare favourably with the algorithm implemented in the well-known Beagle software. Finally, an R package to apply APFA models developed as part of the PhD project is described...

  8. Diametral creep prediction of pressure tube using statistical regression methods

    Energy Technology Data Exchange (ETDEWEB)

    Kim, D. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of); Lee, J.Y. [Korea Electric Power Research Inst., Daejeon (Korea, Republic of); Na, M.G. [Chosun Univ., Gwangju (Korea, Republic of); Jang, C. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of)

    2010-07-01

    Diametral creep prediction of pressure tube in CANDU reactor is an important factor for ROPT calculation. In this study, pressure tube diametral creep prediction models were developed using statistical regression method such as linear mixed model for longitudinal data analysis. Inspection and operating condition data of Wolsong unit 1 and 2 reactors were used. Serial correlation model and random coefficient model were developed for pressure tube diameter prediction. Random coefficient model provided more accurate results than serial correlation model. (author)

  9. Nonparametric test for detecting change in distribution with panel data

    CERN Document Server

    Pommeret, Denys; Ghattas, Badih

    2011-01-01

    This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.

  10. Statistical method for detecting structural change in the growth process.

    Science.gov (United States)

    Ninomiya, Yoshiyuki; Yoshimoto, Atsushi

    2008-03-01

    Due to competition among individual trees and other exogenous factors that change the growth environment, each tree grows following its own growth trend with some structural changes in growth over time. In the present article, a new method is proposed to detect a structural change in the growth process. We formulate the method as a simple statistical test for signal detection without constructing any specific model for the structural change. To evaluate the p-value of the test, the tube method is developed because the regular distribution theory is insufficient. Using two sets of tree diameter growth data sampled from planted forest stands of Cryptomeria japonica in Japan, we conduct an analysis of identifying the effect of thinning on the growth process as a structural change. Our results demonstrate that the proposed method is useful to identify the structural change caused by thinning. We also provide the properties of the method in terms of the size and power of the test.

  11. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  12. Literature in Focus: Statistical Methods in Experimental Physics

    CERN Multimedia

    2007-01-01

    Frederick James was a high-energy physicist who became the CERN "expert" on statistics and is now well-known around the world, in part for this famous text. The first edition of Statistical Methods in Experimental Physics was originally co-written with four other authors and was published in 1971 by North Holland (now an imprint of Elsevier). It became such an important text that demand for it has continued for more than 30 years. Fred has updated it and it was released in a second edition by World Scientific in 2006. It is still a top seller and there is no exaggeration in calling it «the» reference on the subject. A full review of the title appeared in the October CERN Courier.Come and meet the author to hear more about how this book has flourished during its 35-year lifetime. Frederick James Statistical Methods in Experimental Physics Monday, 26th of November, 4 p.m. Council Chamber (Bldg. 503-1-001) The author will be introduced...

  13. Fragment Identification and Statistics Method of Hypervelocity Impact SPH Simulation

    Institute of Scientific and Technical Information of China (English)

    ZHANG Xiaotian; JIA Guanghui; HUANG Hai

    2011-01-01

    A comprehensive treatment to the fragment identification and statistics for the smoothed particle hydrodynamics (SPH) simulation of hypervelocity impact is presented.Based on SPH method, combined with finite element method (FEM), the computation is performed.The fragments are identified by a new pre- and post-processing algorithm and then converted into a binary graph.The number of fragments and the attached SPH particles are determined by counting the quantity of connected domains on the binary graph.The size, velocity vector and mass of each fragment are calculated by the particles' summation and weighted average.The dependence of this method on finite element edge length and simulation terminal time is discussed.An example of tungsten rods impacting steel plates is given for calibration.The computation results match experiments well and demonstrate the effectiveness of this method.

  14. Quantitative EEG Applying the Statistical Recognition Pattern Method

    DEFF Research Database (Denmark)

    Engedal, Knut; Snaedal, Jon; Hoegh, Peter

    2015-01-01

    BACKGROUND/AIM: The aim of this study was to examine the discriminatory power of quantitative EEG (qEEG) applying the statistical pattern recognition (SPR) method to separate Alzheimer's disease (AD) patients from elderly individuals without dementia and from other dementia patients. METHODS...... accepted criteria by at least 2 clinicians. EEGs were recorded in a standardized way and analyzed independently of the clinical diagnoses, using the SPR method. RESULTS: In receiver operating characteristic curve analyses, the qEEGs separated AD patients from healthy elderly individuals with an area under...... the curve (AUC) of 0.90, representing a sensitivity of 84% and a specificity of 81%. The qEEGs further separated patients with Lewy body dementia or Parkinson's disease dementia from AD patients with an AUC of 0.9, a sensitivity of 85% and a specificity of 87%. CONCLUSION: qEEG using the SPR method could...

  15. A review of statistical methods for preprocessing oligonucleotide microarrays.

    Science.gov (United States)

    Wu, Zhijin

    2009-12-01

    Microarrays have become an indispensable tool in biomedical research. This powerful technology not only makes it possible to quantify a large number of nucleic acid molecules simultaneously, but also produces data with many sources of noise. A number of preprocessing steps are therefore necessary to convert the raw data, usually in the form of hybridisation images, to measures of biological meaning that can be used in further statistical analysis. Preprocessing of oligonucleotide arrays includes image processing, background adjustment, data normalisation/transformation and sometimes summarisation when multiple probes are used to target one genomic unit. In this article, we review the issues encountered in each preprocessing step and introduce the statistical models and methods in preprocessing.

  16. Local Component Analysis for Nonparametric Bayes Classifier

    CERN Document Server

    Khademi, Mahmoud; safayani, Meharn

    2010-01-01

    The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...

  17. Mathematical and statistical methods for actuarial sciences and finance

    CERN Document Server

    Sibillo, Marilena

    2014-01-01

    The interaction between mathematicians and statisticians working in the actuarial and financial fields is producing numerous meaningful scientific results. This volume, comprising a series of four-page papers, gathers new ideas relating to mathematical and statistical methods in the actuarial sciences and finance. The book covers a variety of topics of interest from both theoretical and applied perspectives, including: actuarial models; alternative testing approaches; behavioral finance; clustering techniques; coherent and non-coherent risk measures; credit-scoring approaches; data envelopment analysis; dynamic stochastic programming; financial contagion models; financial ratios; intelligent financial trading systems; mixture normality approaches; Monte Carlo-based methodologies; multicriteria methods; nonlinear parameter estimation techniques; nonlinear threshold models; particle swarm optimization; performance measures; portfolio optimization; pricing methods for structured and non-structured derivatives; r...

  18. Evolutionary Computation Methods and their applications in Statistics

    Directory of Open Access Journals (Sweden)

    Francesco Battaglia

    2013-05-01

    Full Text Available A brief discussion of the genesis of evolutionary computation methods, their relationship to artificial intelligence, and the contribution of genetics and Darwin’s theory of natural evolution is provided. Then, the main evolutionary computation methods are illustrated: evolution strategies, genetic algorithms, estimation of distribution algorithms, differential evolution, and a brief description of some evolutionary behavior methods such as ant colony and particle swarm optimization. We also discuss the role of the genetic algorithm for multivariate probability distribution random generation, rather than as a function optimizer. Finally, some relevant applications of genetic algorithm to statistical problems are reviewed: selection of variables in regression, time series model building, outlier identification, cluster analysis, design of experiments.

  19. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  20. Bayesian statistic methods and theri application in probabilistic simulation models

    Directory of Open Access Journals (Sweden)

    Sergio Iannazzo

    2007-03-01

    Full Text Available Bayesian statistic methods are facing a rapidly growing level of interest and acceptance in the field of health economics. The reasons of this success are probably to be found on the theoretical fundaments of the discipline that make these techniques more appealing to decision analysis. To this point should be added the modern IT progress that has developed different flexible and powerful statistical software framework. Among them probably one of the most noticeably is the BUGS language project and its standalone application for MS Windows WinBUGS. Scope of this paper is to introduce the subject and to show some interesting applications of WinBUGS in developing complex economical models based on Markov chains. The advantages of this approach reside on the elegance of the code produced and in its capability to easily develop probabilistic simulations. Moreover an example of the integration of bayesian inference models in a Markov model is shown. This last feature let the analyst conduce statistical analyses on the available sources of evidence and exploit them directly as inputs in the economic model.

  1. Application of Statistical Process Control Methods for IDS

    Directory of Open Access Journals (Sweden)

    Muhammad Sadiq Ali Khan

    2012-11-01

    Full Text Available As technology improves, attackers are trying to get access to the network system resources by so many means. Open loop holes in the network allow them to penetrate in the network more easily; statistical methods have great importance in the area of computer and network security, in detecting the malfunctioning of the network system. Development of internet security solution needed to protect the system and to with stand prolonged and diverse attack. In this paper Statistical approach has been used, conventionally Statistical Control Charts has been used for quality characteristics however in IDS abnormal access can be easily detected and appropriate control limit can be established. Two different charts are investigated and Shewhart chart based on average has produced better accuracy. The approach used here for intrusion detection in such a way that if the data packet is drastically different from normal variation then it can be classified as attack. In other words a system variation may be due to some special reason. If these causes are investigated then natural variation and abnormal variation can be distinguished which can be used for distinction of behaviors of the system.

  2. Statistical and Mathematical Methods for Synoptic Time Domain Surveys

    Science.gov (United States)

    Mahabal, Ashish A.; SAMSI Synoptic Surveys Time Domain Working Group

    2017-01-01

    Recent advances in detector technology, electronics, data storage, and computation have enabled astronomers to collect larger and larger datasets, and moreover, pose interesting questions to answer with those data. The complexity of the data allows data science techniques to be used. These have to be grounded in sound techniques. Identify interesting mathematical and statistical challenges and working on their solutions is one of the aims of the year-long ‘Statistical, Mathematical and Computational Methods for Astronomy (ASTRO)’ program of SAMSI. Of the many working groups that have been formed, one is on Synoptic Time Domain Surveys. Within this we have various subgroups discussing topics such as Designing Statistical Features for Optimal Classification, Scheduling Observations, Incorporating Unstructured Information, Detecting Outliers, Lightcurve Decomposition and Interpolation, Domain Adaptation, and also Designing a Data Challenge. We will briefly highlight some of the work going on in these subgroups along with their interconnections, and the plans for the near future. We will also highlight the overlaps with the other SAMSI working groups and also indicate how the wider astronomy community can both participate and benefit from the activities.

  3. Statistical analysis of the precision of the Match method

    Directory of Open Access Journals (Sweden)

    R. Lehmann

    2005-05-01

    Full Text Available The Match method quantifies chemical ozone loss in the polar stratosphere. The basic idea consists in calculating the forward trajectory of an air parcel that has been probed by an ozone measurement (e.g., by an ozone sonde or satellite and finding a second ozone measurement close to this trajectory. Such an event is called a ''match''. A rate of chemical ozone destruction can be obtained by a statistical analysis of several tens of such match events. Information on the uncertainty of the calculated rate can be inferred from the scatter of the ozone mixing ratio difference (second measurement minus first measurement associated with individual matches. A standard analysis would assume that the errors of these differences are statistically independent. However, this assumption may be violated because different matches can share a common ozone measurement, so that the errors associated with these match events become statistically dependent. Taking this effect into account, we present an analysis of the uncertainty of the final Match result. It has been applied to Match data from the Arctic winters 1995, 1996, 2000, and 2003. For these ozone-sonde Match studies the effect of the error correlation on the uncertainty estimates is rather small: compared to a standard error analysis, the uncertainty estimates increase by 15% on average. However, the effect is more pronounced for typical satellite Match analyses: for an Antarctic satellite Match study (2003, the uncertainty estimates increase by 60% on average.

  4. Hybrid perturbation methods based on statistical time series models

    Science.gov (United States)

    San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario

    2016-04-01

    In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.

  5. Classification of Specialized Farms Applying Multivariate Statistical Methods

    Directory of Open Access Journals (Sweden)

    Zuzana Hloušková

    2017-01-01

    Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.

  6. Optimization of Statistical Methods Impact on Quantitative Proteomics Data.

    Science.gov (United States)

    Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L

    2015-10-02

    As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.

  7. Estimated Accuracy of Three Common Trajectory Statistical Methods

    Science.gov (United States)

    Kabashnikov, Vitaliy P.; Chaikovsky, Anatoli P.; Kucsera, Tom L.; Metelskaya, Natalia S.

    2011-01-01

    Three well-known trajectory statistical methods (TSMs), namely concentration field (CF), concentration weighted trajectory (CWT), and potential source contribution function (PSCF) methods were tested using known sources and artificially generated data sets to determine the ability of TSMs to reproduce spatial distribution of the sources. In the works by other authors, the accuracy of the trajectory statistical methods was estimated for particular species and at specified receptor locations. We have obtained a more general statistical estimation of the accuracy of source reconstruction and have found optimum conditions to reconstruct source distributions of atmospheric trace substances. Only virtual pollutants of the primary type were considered. In real world experiments, TSMs are intended for application to a priori unknown sources. Therefore, the accuracy of TSMs has to be tested with all possible spatial distributions of sources. An ensemble of geographical distributions of virtual sources was generated. Spearman s rank order correlation coefficient between spatial distributions of the known virtual and the reconstructed sources was taken to be a quantitative measure of the accuracy. Statistical estimates of the mean correlation coefficient and a range of the most probable values of correlation coefficients were obtained. All the TSMs that were considered here showed similar close results. The maximum of the ratio of the mean correlation to the width of the correlation interval containing the most probable correlation values determines the optimum conditions for reconstruction. An optimal geographical domain roughly coincides with the area supplying most of the substance to the receptor. The optimal domain s size is dependent on the substance decay time. Under optimum reconstruction conditions, the mean correlation coefficients can reach 0.70 0.75. The boundaries of the interval with the most probable correlation values are 0.6 0.9 for the decay time of 240 h

  8. Concepts and methods in modern theoretical chemistry statistical mechanics

    CERN Document Server

    Ghosh, Swapan Kumar

    2013-01-01

    Concepts and Methods in Modern Theoretical Chemistry: Statistical Mechanics, the second book in a two-volume set, focuses on the dynamics of systems and phenomena. A new addition to the series Atoms, Molecules, and Clusters, this book offers chapters written by experts in their fields. It enables readers to learn how concepts from ab initio quantum chemistry and density functional theory (DFT) can be used to describe, understand, and predict chemical dynamics. This book covers a wide range of subjects, including discussions on the following topics: Time-dependent DFT Quantum fluid dynamics (QF

  9. Methods in probability and statistical inference. Progress report, June 15, 1976--June 14, 1977. [Dept. of Statistics, Univ. of Chicago

    Energy Technology Data Exchange (ETDEWEB)

    Perlman, M D

    1977-03-01

    Research activities of the Department of Statistics, University of Chicago, during the period 15 June 1976 to 14 June 1977 are reviewed. Individual projects were carried out in the following eight areas: statistical computing--approximations to statistical tables and functions; numerical computation of boundary-crossing probabilities for Brownian motion and related stochastic processes; probabilistic methods in statistical mechanics; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; and comparison of several populations. Brief summaries of these projects are given, along with other administrative information. (RWR)

  10. Visualization methods for statistical analysis of microarray clusters

    Directory of Open Access Journals (Sweden)

    Li Kai

    2005-05-01

    Full Text Available Abstract Background The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. Results We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets and is available at http://function.princeton.edu/GeneVAnD. Conclusion Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.

  11. A method for statistically comparing spatial distribution maps

    Directory of Open Access Journals (Sweden)

    Reynolds Mary G

    2009-01-01

    Full Text Available Abstract Background Ecological niche modeling is a method for estimation of species distributions based on certain ecological parameters. Thus far, empirical determination of significant differences between independently generated distribution maps for a single species (maps which are created through equivalent processes, but with different ecological input parameters, has been challenging. Results We describe a method for comparing model outcomes, which allows a statistical evaluation of whether the strength of prediction and breadth of predicted areas is measurably different between projected distributions. To create ecological niche models for statistical comparison, we utilized GARP (Genetic Algorithm for Rule-Set Production software to generate ecological niche models of human monkeypox in Africa. We created several models, keeping constant the case location input records for each model but varying the ecological input data. In order to assess the relative importance of each ecological parameter included in the development of the individual predicted distributions, we performed pixel-to-pixel comparisons between model outcomes and calculated the mean difference in pixel scores. We used a two sample Student's t-test, (assuming as null hypothesis that both maps were identical to each other regardless of which input parameters were used to examine whether the mean difference in corresponding pixel scores from one map to another was greater than would be expected by chance alone. We also utilized weighted kappa statistics, frequency distributions, and percent difference to look at the disparities in pixel scores. Multiple independent statistical tests indicated precipitation as the single most important independent ecological parameter in the niche model for human monkeypox disease. Conclusion In addition to improving our understanding of the natural factors influencing the distribution of human monkeypox disease, such pixel-to-pixel comparison

  12. FOREWORD: Special issue on Statistical and Probabilistic Methods for Metrology

    Science.gov (United States)

    Bich, Walter; Cox, Maurice G.

    2006-08-01

    This special issue of Metrologia is the first that is not devoted to units, or constants, or measurement techniques in some specific field of metrology, but to the generic topic of statistical and probabilistic methods for metrology. The number of papers on this subject in measurement journals, and in Metrologia in particular, has continued to increase over the years, driven by the publication of the Guide to the Expression of Uncertainty in Measurement (GUM) [1] and the Mutual Recognition Arrangement (MRA) of the CIPM [2]. The former stimulated metrologists to think in greater depth about the appropriate modelling of their measurements, in order to provide uncertainty evaluations associated with measurement results. The latter obliged the metrological community to investigate reliable measures for assessing the calibration and measurement capabilities declared by the national metrology institutes (NMIs). Furthermore, statistical analysis of measurement data became even more important than hitherto, with the need, on the one hand, to treat the greater quantities of data provided by sophisticated measurement systems, and, on the other, to deal appropriately with relatively small sets of data that are difficult or expensive to obtain. The importance of supporting the GUM and extending its provisions was recognized by the formation in the year 2000 of Working Group 1, Measurement uncertainty, of the Joint Committee for Guides in Metrology. The need to provide guidance on key comparison data evaluation was recognized by the formation in the year 2001 of the BIPM Director's Advisory Group on Uncertainty. A further international initiative was the revision, in the year 2004, of the remit and title of a working group of ISO/TC 69, Application of Statistical Methods, to reflect the need to concentrate more on statistical methods to support measurement uncertainty evaluation. These international activities are supplemented by national programmes such as the Software Support

  13. Hybrid Perturbation methods based on Statistical Time Series models

    CERN Document Server

    San-Juan, Juan Félix; Pérez, Iván; López, Rosario

    2016-01-01

    In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of a...

  14. Nonlinear diffusion methods based on robust statistics for noise removal

    Institute of Scientific and Technical Information of China (English)

    JIA Di-ye; HUANG Feng-gang; SU Han

    2007-01-01

    A novel smoothness term of Bayesian regularization framework based on M-estimation of robust statistics is proposed, and from this term a class of fourth-order nonlinear diffusion methods is proposed. These methods attempt to approximate an observed image with a piecewise linear image, which looks more natural than piecewise constant image used to approximate an observed image by P-M[1] model. It is known that M-estimators and W-estimators are essentially equivalent and solve the same minimization problem. Then, we propose PL bilateral filter from equivalent W-estimator. This new model is designed for piecewise linear image filtering,which is more effective than normal bilateral filter.

  15. A test statistic for the affected-sib-set method.

    Science.gov (United States)

    Lange, K

    1986-07-01

    This paper discusses generalizations of the affected-sib-pair method. First, the requirement that sib identity-by-descent relations be known unambiguously is relaxed by substituting sib identity-by-state relations. This permits affected sibs to be used even when their parents are unavailable for typing. In the limit of an infinite number of marker alleles each of infinitesimal population frequency, the identity-by-state relations coincide with the usual identity-by-descent relations. Second, a weighted pairs test statistic is proposed that covers affected sib sets of size greater than two. These generalizations make the affected-sib-pair method a more powerful technique for detecting departures from independent segregation of disease and marker phenotypes. A sample calculation suggests such a departure for tuberculoid leprosy and the HLA D locus.

  16. Statistical Inference Methods for Sparse Biological Time Series Data

    Directory of Open Access Journals (Sweden)

    Voit Eberhard O

    2011-04-01

    Full Text Available Abstract Background Comparing metabolic profiles under different biological perturbations has become a powerful approach to investigating the functioning of cells. The profiles can be taken as single snapshots of a system, but more information is gained if they are measured longitudinally over time. The results are short time series consisting of relatively sparse data that cannot be analyzed effectively with standard time series techniques, such as autocorrelation and frequency domain methods. In this work, we study longitudinal time series profiles of glucose consumption in the yeast Saccharomyces cerevisiae under different temperatures and preconditioning regimens, which we obtained with methods of in vivo nuclear magnetic resonance (NMR spectroscopy. For the statistical analysis we first fit several nonlinear mixed effect regression models to the longitudinal profiles and then used an ANOVA likelihood ratio method in order to test for significant differences between the profiles. Results The proposed methods are capable of distinguishing metabolic time trends resulting from different treatments and associate significance levels to these differences. Among several nonlinear mixed-effects regression models tested, a three-parameter logistic function represents the data with highest accuracy. ANOVA and likelihood ratio tests suggest that there are significant differences between the glucose consumption rate profiles for cells that had been--or had not been--preconditioned by heat during growth. Furthermore, pair-wise t-tests reveal significant differences in the longitudinal profiles for glucose consumption rates between optimal conditions and heat stress, optimal and recovery conditions, and heat stress and recovery conditions (p-values Conclusion We have developed a nonlinear mixed effects model that is appropriate for the analysis of sparse metabolic and physiological time profiles. The model permits sound statistical inference procedures

  17. Are Statistics Labs Worth the Effort?--Comparison of Introductory Statistics Courses Using Different Teaching Methods

    Directory of Open Access Journals (Sweden)

    Jose H. Guardiola

    2010-01-01

    Full Text Available This paper compares the academic performance of students in three similar elementary statistics courses taught by the same instructor, but with the lab component differing among the three. One course is traditionally taught without a lab component; the second with a lab component using scenarios and an extensive use of technology, but without explicit coordination between lab and lecture; and the third using a lab component with an extensive use of technology that carefully coordinates the lab with the lecture. Extensive use of technology means, in this context, using Minitab software in the lab section, doing homework and quizzes using MyMathlab ©, and emphasizing interpretation of computer output during lectures. Initially, an online instrument based on Gardner’s multiple intelligences theory, is given to students to try to identify students’ learning styles and intelligence types as covariates. An analysis of covariance is performed in order to compare differences in achievement. In this study there is no attempt to measure difference in student performance across the different treatments. The purpose of this study is to find indications of associations among variables that support the claim that statistics labs could be associated with superior academic achievement in one of these three instructional environments. Also, this study tries to identify individual student characteristics that could be associated with superior academic performance. This study did not find evidence of any individual student characteristics that could be associated with superior achievement. The response variable was computed as percentage of correct answers for the three exams during the semester added together. The results of this study indicate a significant difference across these three different instructional methods, showing significantly higher mean scores for the response variable on students taking the lab component that was carefully coordinated with

  18. Bayesian nonparametric meta-analysis using Polya tree mixture models.

    Science.gov (United States)

    Branscum, Adam J; Hanson, Timothy E

    2008-09-01

    Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.

  19. Comparison of prediction performance using statistical postprocessing methods

    Science.gov (United States)

    Han, Keunhee; Choi, JunTae; Kim, Chansoo

    2016-11-01

    As the 2018 Winter Olympics are to be held in Pyeongchang, both general weather information on Pyeongchang and specific weather information on this region, which can affect game operation and athletic performance, are required. An ensemble prediction system has been applied to provide more accurate weather information, but it has bias and dispersion due to the limitations and uncertainty of its model. In this study, homogeneous and nonhomogeneous regression models as well as Bayesian model averaging (BMA) were used to reduce the bias and dispersion existing in ensemble prediction and to provide probabilistic forecast. Prior to applying the prediction methods, reliability of the ensemble forecasts was tested by using a rank histogram and a residualquantile-quantile plot to identify the ensemble forecasts and the corresponding verifications. The ensemble forecasts had a consistent positive bias, indicating over-forecasting, and were under-dispersed. To correct such biases, statistical post-processing methods were applied using fixed and sliding windows. The prediction skills of methods were compared by using the mean absolute error, root mean square error, continuous ranked probability score, and continuous ranked probability skill score. Under the fixed window, BMA exhibited better prediction skill than the other methods in most observation station. Under the sliding window, on the other hand, homogeneous and non-homogeneous regression models with positive regression coefficients exhibited better prediction skill than BMA. In particular, the homogeneous regression model with positive regression coefficients exhibited the best prediction skill.

  20. Statistical methods for the detection and analysis of radioactive sources

    Science.gov (United States)

    Klumpp, John

    We consider four topics from areas of radioactive statistical analysis in the present study: Bayesian methods for the analysis of count rate data, analysis of energy data, a model for non-constant background count rate distributions, and a zero-inflated model of the sample count rate. The study begins with a review of Bayesian statistics and techniques for analyzing count rate data. Next, we consider a novel system for incorporating energy information into count rate measurements which searches for elevated count rates in multiple energy regions simultaneously. The system analyzes time-interval data in real time to sequentially update a probability distribution for the sample count rate. We then consider a "moving target" model of background radiation in which the instantaneous background count rate is a function of time, rather than being fixed. Unlike the sequential update system, this model assumes a large body of pre-existing data which can be analyzed retrospectively. Finally, we propose a novel Bayesian technique which allows for simultaneous source detection and count rate analysis. This technique is fully compatible with, but independent of, the sequential update system and moving target model.

  1. A Statistical Method to Distinguish Functional Brain Networks

    Science.gov (United States)

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  2. Jet Noise Diagnostics Supporting Statistical Noise Prediction Methods

    Science.gov (United States)

    Bridges, James E.

    2006-01-01

    compared against measurements of mean and rms velocity statistics over a range of jet speeds and temperatures. Models for flow parameters used in the acoustic analogy, most notably the space-time correlations of velocity, have been compared against direct measurements, and modified to better fit the observed data. These measurements have been extremely challenging for hot, high speed jets, and represent a sizeable investment in instrumentation development. As an intermediate check that the analysis is predicting the physics intended, phased arrays have been employed to measure source distributions for a wide range of jet cases. And finally, careful far-field spectral directivity measurements have been taken for final validation of the prediction code. Examples of each of these experimental efforts will be presented. The main result of these efforts is a noise prediction code, named JeNo, which is in middevelopment. JeNo is able to consistently predict spectral directivity, including aft angle directivity, for subsonic cold jets of most geometries. Current development on JeNo is focused on extending its capability to hot jets, requiring inclusion of a previously neglected second source associated with thermal fluctuations. A secondary result of the intensive experimentation is the archiving of various flow statistics applicable to other acoustic analogies and to development of time-resolved prediction methods. These will be of lasting value as we look ahead at future challenges to the aeroacoustic experimentalist.

  3. Development and testing of improved statistical wind power forecasting methods.

    Energy Technology Data Exchange (ETDEWEB)

    Mendes, J.; Bessa, R.J.; Keko, H.; Sumaili, J.; Miranda, V.; Ferreira, C.; Gama, J.; Botterud, A.; Zhou, Z.; Wang, J. (Decision and Information Sciences); (INESC Porto)

    2011-12-06

    (with spatial and/or temporal dependence). Statistical approaches to uncertainty forecasting basically consist of estimating the uncertainty based on observed forecasting errors. Quantile regression (QR) is currently a commonly used approach in uncertainty forecasting. In Chapter 3, we propose new statistical approaches to the uncertainty estimation problem by employing kernel density forecast (KDF) methods. We use two estimators in both offline and time-adaptive modes, namely, the Nadaraya-Watson (NW) and Quantilecopula (QC) estimators. We conduct detailed tests of the new approaches using QR as a benchmark. One of the major issues in wind power generation are sudden and large changes of wind power output over a short period of time, namely ramping events. In Chapter 4, we perform a comparative study of existing definitions and methodologies for ramp forecasting. We also introduce a new probabilistic method for ramp event detection. The method starts with a stochastic algorithm that generates wind power scenarios, which are passed through a high-pass filter for ramp detection and estimation of the likelihood of ramp events to happen. The report is organized as follows: Chapter 2 presents the results of the application of ITL training criteria to deterministic WPF; Chapter 3 reports the study on probabilistic WPF, including new contributions to wind power uncertainty forecasting; Chapter 4 presents a new method to predict and visualize ramp events, comparing it with state-of-the-art methodologies; Chapter 5 briefly summarizes the main findings and contributions of this report.

  4. Axial electron channeling statistical method of site occupancy determination

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Multibeams dynamical theory of electron diffraction has been used to calculate the fast electron thickness-integrated probability density on Ti and Al sites in the γ-TiAl phase as a function of the incident electron beam orientation along \\[100\\], \\[110\\] and \\[011\\] zone axes, with the effect of absorption considered. Both of the calculation and experiments show that there are big differences in electron channeling effect for different zone axes or the same axis but with different orientations, so we should choose proper zone axis and suitable incident beam tilting angles when using the axial electron channeling statistical method to determine the site occupancies of impurities. It is suggested to calculate the channeling effect map before the experiments.

  5. NEW METHOD FOR CALCULATION OF STATISTIC MISTAKE IN MARKETING INVESTIGATIONS

    Directory of Open Access Journals (Sweden)

    V. A. Koldachiov

    2008-01-01

    Full Text Available An idea of a new method  is that while breaking-down analysis sample in some sub-samples there is a probability that an actual value for general body will be inside the interval between the highest and lowest average meaning of sub-sample is much higher of the probability that the given value will be  beyond the limits of the indicated interval. In this case a size of the interval appears to be less than analogous parameter while making calculation with the help of the Stewdent formula.Thus, it is possible to reach high accuracy in results of marketing investigations while preserving analysis sample size or reducing the necessary size of analysis sample while preserving level of statistical mistake.

  6. Statistical methods for determining the effect of mammography screening

    DEFF Research Database (Denmark)

    Lophaven, Søren

    2016-01-01

    In an overview of five randomised controlled trials from Sweden, a reduction of 29% was found in breast cancer mortality in women aged 50-69 at randomisation after a follow up of 5-13 years. Organised, population based, mammography service screening was introduced on the basis of these resultsin...... the municipality of Copenhagen in 1991, in the county of Fyn in 1993 and in the municipality of Frederiksberg in 1994, although reduced mortality in randomised controlled trials does not necessarily mean that screening also works in routine health care. In the rest of Denmark mammography screening was introdueed...... in 2007-2008. Women aged 50-69 were invited to screening every second year. Taking advantage of the registers of population and health, we present statistical methods for evaluating the effect of mammography screening on breast cancer mortality (Olsen et al. 2005, Njor et al. 2015 and Weedon-Fekjær etal...

  7. Bayesian Analysis of Multiple Populations I: Statistical and Computational Methods

    CERN Document Server

    Stenning, D C; Robinson, E; van Dyk, D A; von Hippel, T; Sarajedini, A; Stein, N

    2016-01-01

    We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations (vanDyk et al. 2009, Stein et al. 2013). Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties---age, metallicity, helium abundance, distance, absorption, and initial mass---are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and al...

  8. Statistical methods for determining the effect of mammography screening

    DEFF Research Database (Denmark)

    Lophaven, Søren

    2016-01-01

    In an overview of five randomised controlled trials from Sweden, a reduction of 29% was found in breast cancer mortality in women aged 50-69 at randomisation after a follow up of 5-13 years. Organised, population based, mammography service screening was introduced on the basis of these resultsin...... the municipality of Copenhagen in 1991, in the county of Fyn in 1993 and in the municipality of Frederiksberg in 1994, although reduced mortality in randomised controlled trials does not necessarily mean that screening also works in routine health care. In the rest of Denmark mammography screening was introdueed...... in 2007-2008. Women aged 50-69 were invited to screening every second year. Taking advantage of the registers of population and health, we present statistical methods for evaluating the effect of mammography screening on breast cancer mortality (Olsen et al. 2005, Njor et al. 2015 and Weedon-Fekjær etal...

  9. Nonparametric Maximum Entropy Estimation on Information Diagrams

    CERN Document Server

    Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn

    2016-01-01

    Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...

  10. Quality in statistics education : Determinants of course outcomes in methods & statistics education at universities and colleges

    NARCIS (Netherlands)

    Verhoeven, P.S.

    2009-01-01

    Although Statistics is not a very popular course according to most students, a majority of students still take it, as it is mandatory at most Social Science departments. Therefore it takes special teacher’s skills to teach statistics. In order to do so it is essential for teachers to know what stude

  11. Assessment Methods in Statistical Education An International Perspective

    CERN Document Server

    Bidgood, Penelope; Jolliffe, Flavia

    2010-01-01

    This book is a collaboration from leading figures in statistical education and is designed primarily for academic audiences involved in teaching statistics and mathematics. The book is divided in four sections: (1) Assessment using real-world problems, (2) Assessment statistical thinking, (3) Individual assessment (4) Successful assessment strategies.

  12. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  13. A non-parametric model for the cosmic velocity field

    NARCIS (Netherlands)

    Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S

    1999-01-01

    We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.

  14. CATDAT : A Program for Parametric and Nonparametric Categorical Data Analysis : User's Manual Version 1.0, 1998-1999 Progress Report.

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, James T.

    1999-12-01

    Natural resource professionals are increasingly required to develop rigorous statistical models that relate environmental data to categorical responses data. Recent advances in the statistical and computing sciences have led to the development of sophisticated methods for parametric and nonparametric analysis of data with categorical responses. The statistical software package CATDAT was designed to make some of these relatively new and powerful techniques available to scientists. The CATDAT statistical package includes 4 analytical techniques: generalized logit modeling; binary classification tree; extended K-nearest neighbor classification; and modular neural network.

  15. Emperical Laws in Economics Uncovered Using Methods in Statistical Mechanics

    Science.gov (United States)

    Stanley, H. Eugene

    2001-06-01

    In recent years, statistical physicists and computational physicists have determined that physical systems which consist of a large number of interacting particles obey universal "scaling laws" that serve to demonstrate an intrinsic self-similarity operating in such systems. Further, the parameters appearing in these scaling laws appear to be largely independent of the microscopic details. Since economic systems also consist of a large number of interacting units, it is plausible that scaling theory can be usefully applied to economics. To test this possibility using realistic data sets, a number of scientists have begun analyzing economic data using methods of statistical physics [1]. We have found evidence for scaling (and data collapse), as well as universality, in various quantities, and these recent results will be reviewed in this talk--starting with the most recent study [2]. We also propose models that may lead to some insight into these phenomena. These results will be discussed, as well as the overall rationale for why one might expect scaling principles to hold for complex economic systems. This work on which this talk is based is supported by BP, and was carried out in collaboration with L. A. N. Amaral S. V. Buldyrev, D. Canning, P. Cizeau, X. Gabaix, P. Gopikrishnan, S. Havlin, Y. Lee, Y. Liu, R. N. Mantegna, K. Matia, M. Meyer, C.-K. Peng, V. Plerou, M. A. Salinger, and M. H. R. Stanley. [1.] See, e.g., R. N. Mantegna and H. E. Stanley, Introduction to Econophysics: Correlations & Complexity in Finance (Cambridge University Press, Cambridge, 1999). [2.] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stanley, "Identifying Business Sectors from Stock Price Fluctuations," e-print cond-mat/0011145; V. Plerou, P. Gopikrishnan, L. A. N. Amaral, X. Gabaix, and H. E. Stanley, "Diffusion and Economic Fluctuations," Phys. Rev. E (Rapid Communications) 62, 3023-3026 (2000); P. Gopikrishnan, V. Plerou, X. Gabaix, and H. E. Stanley, "Statistical Properties of

  16. Statistical methods for detecting periodic fragments in DNA sequence data

    Directory of Open Access Journals (Sweden)

    Ying Hua

    2011-04-01

    Full Text Available Abstract Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT, integer period discrete Fourier transform (IPDFT and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS. Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of

  17. System Synthesis in Preliminary Aircraft Design Using Statistical Methods

    Science.gov (United States)

    DeLaurentis, Daniel; Mavris, Dimitri N.; Schrage, Daniel P.

    1996-01-01

    This paper documents an approach to conceptual and early preliminary aircraft design in which system synthesis is achieved using statistical methods, specifically Design of Experiments (DOE) and Response Surface Methodology (RSM). These methods are employed in order to more efficiently search the design space for optimum configurations. In particular, a methodology incorporating three uses of these techniques is presented. First, response surface equations are formed which represent aerodynamic analyses, in the form of regression polynomials, which are more sophisticated than generally available in early design stages. Next, a regression equation for an Overall Evaluation Criterion is constructed for the purpose of constrained optimization at the system level. This optimization, though achieved in an innovative way, is still traditional in that it is a point design solution. The methodology put forward here remedies this by introducing uncertainty into the problem, resulting in solutions which are probabilistic in nature. DOE/RSM is used for the third time in this setting. The process is demonstrated through a detailed aero-propulsion optimization of a High Speed Civil Transport. Fundamental goals of the methodology, then, are to introduce higher fidelity disciplinary analyses to the conceptual aircraft synthesis and provide a roadmap for transitioning from point solutions to probabilistic designs (and eventually robust ones).

  18. Nonparametric Bayesian inference in biostatistics

    CERN Document Server

    Müller, Peter

    2015-01-01

    As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...

  19. Nonparametric Regression with Common Shocks

    Directory of Open Access Journals (Sweden)

    Eduardo A. Souza-Rodrigues

    2016-09-01

    Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.

  20. Nonparametric TOA estimators for low-resolution IR-UWB digital receiver

    Institute of Scientific and Technical Information of China (English)

    Yanlong Zhang; Weidong Chen

    2015-01-01

    Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.

  1. Nonparametric Bayesian Modeling of Complex Networks

    DEFF Research Database (Denmark)

    Schmidt, Mikkel Nørgaard; Mørup, Morten

    2013-01-01

    Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....

  2. An asymptotically optimal nonparametric adaptive controller

    Institute of Scientific and Technical Information of China (English)

    郭雷; 谢亮亮

    2000-01-01

    For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.

  3. Nonparametric estimation of location and scale parameters

    KAUST Repository

    Potgieter, C.J.

    2012-12-01

    Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.

  4. Hydrologic extremes - an intercomparison of multiple gridded statistical downscaling methods

    Science.gov (United States)

    Werner, Arelia T.; Cannon, Alex J.

    2016-04-01

    Gridded statistical downscaling methods are the main means of preparing climate model data to drive distributed hydrological models. Past work on the validation of climate downscaling methods has focused on temperature and precipitation, with less attention paid to the ultimate outputs from hydrological models. Also, as attention shifts towards projections of extreme events, downscaling comparisons now commonly assess methods in terms of climate extremes, but hydrologic extremes are less well explored. Here, we test the ability of gridded downscaling models to replicate historical properties of climate and hydrologic extremes, as measured in terms of temporal sequencing (i.e. correlation tests) and distributional properties (i.e. tests for equality of probability distributions). Outputs from seven downscaling methods - bias correction constructed analogues (BCCA), double BCCA (DBCCA), BCCA with quantile mapping reordering (BCCAQ), bias correction spatial disaggregation (BCSD), BCSD using minimum/maximum temperature (BCSDX), the climate imprint delta method (CI), and bias corrected CI (BCCI) - are used to drive the Variable Infiltration Capacity (VIC) model over the snow-dominated Peace River basin, British Columbia. Outputs are tested using split-sample validation on 26 climate extremes indices (ClimDEX) and two hydrologic extremes indices (3-day peak flow and 7-day peak flow). To characterize observational uncertainty, four atmospheric reanalyses are used as climate model surrogates and two gridded observational data sets are used as downscaling target data. The skill of the downscaling methods generally depended on reanalysis and gridded observational data set. However, CI failed to reproduce the distribution and BCSD and BCSDX the timing of winter 7-day low-flow events, regardless of reanalysis or observational data set. Overall, DBCCA passed the greatest number of tests for the ClimDEX indices, while BCCAQ, which is designed to more accurately resolve event

  5. Improved statistical method for temperature and salinity quality control

    Science.gov (United States)

    Gourrion, Jérôme; Szekely, Tanguy

    2017-04-01

    Climate research and Ocean monitoring benefit from the continuous development of global in-situ hydrographic networks in the last decades. Apart from the increasing volume of observations available on a large range of temporal and spatial scales, a critical aspect concerns the ability to constantly improve the quality of the datasets. In the context of the Coriolis Dataset for ReAnalysis (CORA) version 4.2, a new quality control method based on a local comparison to historical extreme values ever observed is developed, implemented and validated. Temperature, salinity and potential density validity intervals are directly estimated from minimum and maximum values from an historical reference dataset, rather than from traditional mean and standard deviation estimates. Such an approach avoids strong statistical assumptions on the data distributions such as unimodality, absence of skewness and spatially homogeneous kurtosis. As a new feature, it also allows addressing simultaneously the two main objectives of an automatic quality control strategy, i.e. maximizing the number of good detections while minimizing the number of false alarms. The reference dataset is presently built from the fusion of 1) all ARGO profiles up to late 2015, 2) 3 historical CTD datasets and 3) the Sea Mammals CTD profiles from the MEOP database. All datasets are extensively and manually quality controlled. In this communication, the latest method validation results are also presented. The method has already been implemented in the latest version of the delayed-time CMEMS in-situ dataset and will be deployed soon in the equivalent near-real time products.

  6. Determination of Reference Catalogs for Meridian Observations Using Statistical Method

    Science.gov (United States)

    Li, Z. Y.

    2014-09-01

    The meridian observational data are useful for developing high-precision planetary ephemerides of the solar system. These historical data are provided by the jet propulsion laboratory (JPL) or the Institut De Mecanique Celeste Et De Calcul Des Ephemerides (IMCCE). However, we find that the reference systems (realized by the fundamental catalogs FK3 (Third Fundamental Catalogue), FK4 (Fourth Fundamental Catalogue), and FK5 (Fifth Fundamental Catalogue), or Hipparcos), to which the observations are referred, are not given explicitly for some sets of data. The incompleteness of information prevents us from eliminating the systematic effects due to the different fundamental catalogs. The purpose of this paper is to specify clearly the reference catalogs of these observations with the problems in their records by using the JPL DE421 ephemeris. The data for the corresponding planets in the geocentric celestial reference system (GCRS) obtained from the DE421 are transformed to the apparent places with different hypothesis regarding the reference catalogs. Then the validations of the hypothesis are tested by two kinds of statistical quantities which are used to indicate the significance of difference between the original and transformed data series. As a result, this method is proved to be effective for specifying the reference catalogs, and the missed information is determined unambiguously. Finally these meridian data are transformed to the GCRS for further applications in the development of planetary ephemerides.

  7. Methods in probability and statistical inference. Final report, June 15, 1975-June 30, 1979. [Dept. of Statistics, Univ. of Chicago

    Energy Technology Data Exchange (ETDEWEB)

    Wallace, D L; Perlman, M D

    1980-06-01

    This report describes the research activities of the Department of Statistics, University of Chicago, during the period June 15, 1975 to July 30, 1979. Nine research projects are briefly described on the following subjects: statistical computing and approximation techniques in statistics; numerical computation of first passage distributions; probabilities of large deviations; combining independent tests of significance; small-sample efficiencies of tests and estimates; improved procedures for simultaneous estimation and testing of many correlations; statistical computing and improved regression methods; comparison of several populations; and unbiasedness in multivariate statistics. A description of the statistical consultation activities of the Department that are of interest to DOE, in particular, the scientific interactions between the Department and the scientists at Argonne National Laboratories, is given. A list of publications issued during the term of the contract is included.

  8. Statistical methods for decision making in mine action

    DEFF Research Database (Denmark)

    Larsen, Jan

    The lecture discusses the basics of statistical decision making in connection with humanitarian mine action. There is special focus on: 1) requirements for mine detection; 2) design and evaluation of mine equipment; 3) performance improvement by statistical learning and information fusion; 4...

  9. Statistics a guide to the use of statistical methods in the physical sciences

    CERN Document Server

    Barlow, Roger J

    1989-01-01

    The Manchester Physics Series General Editors: D. J. Sandiford; F. Mandl; A. C. Phillips Department of Physics and Astronomy, University of Manchester Properties of Matter B. H. Flowers and E. Mendoza Optics Second Edition F. G. Smith and J. H. Thomson Statistical Physics Second Edition F. Mandl Electromagnetism Second Edition I. S. Grant and W. R. Phillips Statistics R. J. Barlow Solid State Physics Second Edition J. R. Hook and H. E. Hall Quantum Mechanics F. Mandl Particle Physics Second Edition B. R. Martin and G. Shaw The Physics of Stars Second Edition A.C. Phillips Computing for Scienti

  10. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data

    Directory of Open Access Journals (Sweden)

    Cox Juergen

    2012-11-01

    Full Text Available Abstract Quantitative proteomics now provides abundance ratios for thousands of proteins upon perturbations. These need to be functionally interpreted and correlated to other types of quantitative genome-wide data such as the corresponding transcriptome changes. We describe a new method, 2D annotation enrichment, which compares quantitative data from any two 'omics' types in the context of categorical annotation of the proteins or genes. Suitable genome-wide categories are membership of proteins in biochemical pathways, their annotation with gene ontology terms, sub-cellular localization, the presence of protein domains or the membership in protein complexes. 2D annotation enrichment detects annotation terms whose members show consistent behavior in one or both of the data dimensions. This consistent behavior can be a correlation between the two data types, such as simultaneous up- or down-regulation in both data dimensions, or a lack thereof, such as regulation in one dimension but no change in the other. For the statistical formulation of the test we introduce a two-dimensional generalization of the nonparametric two-sample test. The false discovery rate is stringently controlled by correcting for multiple hypothesis testing. We also describe one-dimensional annotation enrichment, which can be applied to single omics data. The 1D and 2D annotation enrichment algorithms are freely available as part of the Perseus software.

  11. Nonparametric Estimation of Mean and Variance and Pricing of Securities Nonparametric Estimation of Mean and Variance and Pricing of Sec

    Directory of Open Access Journals (Sweden)

    Akhtar R. Siddique

    2000-03-01

    Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.

  12. Robust Control Methods for On-Line Statistical Learning

    Directory of Open Access Journals (Sweden)

    Capobianco Enrico

    2001-01-01

    Full Text Available The issue of controlling that data processing in an experiment results not affected by the presence of outliers is relevant for statistical control and learning studies. Learning schemes should thus be tested for their capacity of handling outliers in the observed training set so to achieve reliable estimates with respect to the crucial bias and variance aspects. We describe possible ways of endowing neural networks with statistically robust properties by defining feasible error criteria. It is convenient to cast neural nets in state space representations and apply both Kalman filter and stochastic approximation procedures in order to suggest statistically robustified solutions for on-line learning.

  13. Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.

    Science.gov (United States)

    Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F

    2013-04-01

    In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.

  14. Statistical methods in joint modeling of longitudinal and survival data

    Science.gov (United States)

    Dempsey, Walter

    Survival studies often generate not only a survival time for each patient but also a sequence of health measurements at annual or semi-annual check-ups while the patient remains alive. Such a sequence of random length accompanied by a survival time is called a survival process. Ordinarily robust health is associated with longer survival, so the two parts of a survival process cannot be assumed independent. The first part of the thesis is concerned with a general technique---reverse alignment---for constructing statistical models for survival processes. A revival model is a regression model in the sense that it incorporates covariate and treatment effects into both the distribution of survival times and the joint distribution of health outcomes. The revival model also determines a conditional survival distribution given the observed history, which describes how the subsequent survival distribution is determined by the observed progression of health outcomes. The second part of the thesis explores the concept of a consistent exchangeable survival process---a joint distribution of survival times in which the risk set evolves as a continuous-time Markov process with homogeneous transition rates. A correspondence with the de Finetti approach of constructing an exchangeable survival process by generating iid survival times conditional on a completely independent hazard measure is shown. Several specific processes are detailed, showing how the number of blocks of tied failure times grows asymptotically with the number of individuals in each case. In particular, we show that the set of Markov survival processes with weakly continuous predictive distributions can be characterized by a two-dimensional family called the harmonic process. The outlined methods are then applied to data, showing how they can be easily extended to handle censoring and inhomogeneity among patients.

  15. A comparative assessment of statistical methods for extreme weather analysis

    Science.gov (United States)

    Schlögl, Matthias; Laaha, Gregor

    2017-04-01

    Extreme weather exposure assessment is of major importance for scientists and practitioners alike. We compare different extreme value approaches and fitting methods with respect to their value for assessing extreme precipitation and temperature impacts. Based on an Austrian data set from 25 meteorological stations representing diverse meteorological conditions, we assess the added value of partial duration series over the standardly used annual maxima series in order to give recommendations for performing extreme value statistics of meteorological hazards. Results show the merits of the robust L-moment estimation, which yielded better results than maximum likelihood estimation in 62 % of all cases. At the same time, results question the general assumption of the threshold excess approach (employing partial duration series, PDS) being superior to the block maxima approach (employing annual maxima series, AMS) due to information gain. For low return periods (non-extreme events) the PDS approach tends to overestimate return levels as compared to the AMS approach, whereas an opposite behavior was found for high return levels (extreme events). In extreme cases, an inappropriate threshold was shown to lead to considerable biases that may outperform the possible gain of information from including additional extreme events by far. This effect was neither visible from the square-root criterion, nor from standardly used graphical diagnosis (mean residual life plot), but from a direct comparison of AMS and PDS in synoptic quantile plots. We therefore recommend performing AMS and PDS approaches simultaneously in order to select the best suited approach. This will make the analyses more robust, in cases where threshold selection and dependency introduces biases to the PDS approach, but also in cases where the AMS contains non-extreme events that may introduce similar biases. For assessing the performance of extreme events we recommend conditional performance measures that focus

  16. Statistical Models and Methods for Network Meta-Analysis.

    Science.gov (United States)

    Madden, L V; Piepho, H-P; Paul, P A

    2016-08-01

    Meta-analysis, the methodology for analyzing the results from multiple independent studies, has grown tremendously in popularity over the last four decades. Although most meta-analyses involve a single effect size (summary result, such as a treatment difference) from each study, there are often multiple treatments of interest across the network of studies in the analysis. Multi-treatment (or network) meta-analysis can be used for simultaneously analyzing the results from all the treatments. However, the methodology is considerably more complicated than for the analysis of a single effect size, and there have not been adequate explanations of the approach for agricultural investigations. We review the methods and models for conducting a network meta-analysis based on frequentist statistical principles, and demonstrate the procedures using a published multi-treatment plant pathology data set. A major advantage of network meta-analysis is that correlations of estimated treatment effects are automatically taken into account when an appropriate model is used. Moreover, treatment comparisons may be possible in a network meta-analysis that are not possible in a single study because all treatments of interest may not be included in any given study. We review several models that consider the study effect as either fixed or random, and show how to interpret model-fitting output. We further show how to model the effect of moderator variables (study-level characteristics) on treatment effects, and present one approach to test for the consistency of treatment effects across the network. Online supplemental files give explanations on fitting the network meta-analytical models using SAS.

  17. Understanding data better with Bayesian and global statistical methods

    CERN Document Server

    Press, W H

    1996-01-01

    To understand their data better, astronomers need to use statistical tools that are more advanced than traditional ``freshman lab'' statistics. As an illustration, the problem of combining apparently incompatible measurements of a quantity is presented from both the traditional, and a more sophisticated Bayesian, perspective. Explicit formulas are given for both treatments. Results are shown for the value of the Hubble Constant, and a 95% confidence interval of 66 < H0 < 82 (km/s/Mpc) is obtained.

  18. Teaching biology through statistics: application of statistical methods in genetics and zoology courses.

    Science.gov (United States)

    Colon-Berlingeri, Migdalisel; Burrowes, Patricia A

    2011-01-01

    Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.

  19. The Effects of Sample Size on Expected Value, Variance and Fraser Efficiency for Nonparametric Independent Two Sample Tests

    Directory of Open Access Journals (Sweden)

    Ismet DOGAN

    2015-10-01

    Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.

  20. Modern applied U-statistics

    CERN Document Server

    Kowalski, Jeanne

    2008-01-01

    A timely and applied approach to the newly discovered methods and applications of U-statisticsBuilt on years of collaborative research and academic experience, Modern Applied U-Statistics successfully presents a thorough introduction to the theory of U-statistics using in-depth examples and applications that address contemporary areas of study including biomedical and psychosocial research. Utilizing a "learn by example" approach, this book provides an accessible, yet in-depth, treatment of U-statistics, as well as addresses key concepts in asymptotic theory by integrating translational and cross-disciplinary research.The authors begin with an introduction of the essential and theoretical foundations of U-statistics such as the notion of convergence in probability and distribution, basic convergence results, stochastic Os, inference theory, generalized estimating equations, as well as the definition and asymptotic properties of U-statistics. With an emphasis on nonparametric applications when and where applic...

  1. Statistics and finance an introduction

    CERN Document Server

    Ruppert, David

    2004-01-01

    This textbook emphasizes the applications of statistics and probability to finance. Students are assumed to have had a prior course in statistics, but no background in finance or economics. The basics of probability and statistics are reviewed and more advanced topics in statistics, such as regression, ARMA and GARCH models, the bootstrap, and nonparametric regression using splines, are introduced as needed. The book covers the classical methods of finance such as portfolio theory, CAPM, and the Black-Scholes formula, and it introduces the somewhat newer area of behavioral finance. Applications and use of MATLAB and SAS software are stressed. The book will serve as a text in courses aimed at advanced undergraduates and masters students in statistics, engineering, and applied mathematics as well as quantitatively oriented MBA students. Those in the finance industry wishing to know more statistics could also use it for self-study. David Ruppert is the Andrew Schultz, Jr. Professor of Engineering, School of Oper...

  2. Nonparametric Detection of Geometric Structures Over Networks

    Science.gov (United States)

    Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent

    2017-10-01

    Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.

  3. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  4. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  5. Statistical methods and applications from a historical perspective selected issues

    CERN Document Server

    Mignani, Stefania

    2014-01-01

    The book showcases a selection of peer-reviewed papers, the preliminary versions of which were presented at a conference held 11-13 June 2011 in Bologna and organized jointly by the Italian Statistical Society (SIS), the National Institute of Statistics (ISTAT) and the Bank of Italy. The theme of the conference was "Statistics in the 150 years of the Unification of Italy." The celebration of the anniversary of Italian unification provided the opportunity to examine and discuss the methodological aspects and applications from a historical perspective and both from a national and international point of view. The critical discussion on the issues of the past has made it possible to focus on recent advances, considering the studies of socio-economic and demographic changes in European countries.

  6. Nonparametric estimation for hazard rate monotonously decreasing system

    Institute of Scientific and Technical Information of China (English)

    Han Fengyan; Li Weisong

    2005-01-01

    Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.

  7. Debating Curricular Strategies for Teaching Statistics and Research Methods: What Does the Current Evidence Suggest?

    Science.gov (United States)

    Barron, Kenneth E.; Apple, Kevin J.

    2014-01-01

    Coursework in statistics and research methods is a core requirement in most undergraduate psychology programs. However, is there an optimal way to structure and sequence methodology courses to facilitate student learning? For example, should statistics be required before research methods, should research methods be required before statistics, or…

  8. Debating Curricular Strategies for Teaching Statistics and Research Methods: What Does the Current Evidence Suggest?

    Science.gov (United States)

    Barron, Kenneth E.; Apple, Kevin J.

    2014-01-01

    Coursework in statistics and research methods is a core requirement in most undergraduate psychology programs. However, is there an optimal way to structure and sequence methodology courses to facilitate student learning? For example, should statistics be required before research methods, should research methods be required before statistics, or…

  9. Critical Realism and Statistical Methods--A Response to Nash

    Science.gov (United States)

    Scott, David

    2007-01-01

    This article offers a defence of critical realism in the face of objections Nash (2005) makes to it in a recent edition of this journal. It is argued that critical and scientific realisms are closely related and that both are opposed to statistical positivism. However, the suggestion is made that scientific realism retains (from statistical…

  10. Statistical methods for decision making in mine action

    DEFF Research Database (Denmark)

    Larsen, Jan

    The design and evaluation of mine clearance equipment – the problem of reliability * Detection probability – tossing a coin * Requirements in mine action * Detection probability and confidence in MA * Using statistics in area reduction Improving performance by information fusion and combination...

  11. Bayesian nonparametric duration model with censorship

    Directory of Open Access Journals (Sweden)

    Joseph Hakizamungu

    2007-10-01

    Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.

  12. Bootstrap Estimation for Nonparametric Efficiency Estimates

    OpenAIRE

    1995-01-01

    This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...

  13. Nonparametric analysis of the time structure of seismicity in a geographic region

    Directory of Open Access Journals (Sweden)

    A. Quintela-del-Río

    2002-06-01

    Full Text Available As an alternative to traditional parametric approaches, we suggest nonparametric methods for analyzing temporal data on earthquake occurrences. In particular, the kernel method for estimating the hazard function and the intensity function are presented. One novelty of our approaches is that we take into account the possible dependence of the data to estimate the distribution of time intervals between earthquakes, which has not been considered in most statistics studies on seismicity. Kernel estimation of hazard function has been used to study the occurrence process of cluster centers (main shocks. Kernel intensity estimation, on the other hand, has helped to describe the occurrence process of cluster members (aftershocks. Similar studies in two geographic areas of Spain (Granada and Galicia have been carried out to illustrate the estimation methods suggested.

  14. Non-parametric probabilistic forecasts of wind power: required properties and evaluation

    DEFF Research Database (Denmark)

    Pinson, Pierre; Nielsen, Henrik Aalborg; Møller, Jan Kloppenborg;

    2007-01-01

    of the conditional expectation of future generation for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form...... of a single or a set of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates...

  15. A non-parametric peak finder algorithm and its application in searches for new physics

    CERN Document Server

    Chekanov, S

    2011-01-01

    We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.

  16. Statistical methods for data analysis in particle physics

    CERN Document Server

    Lista, Luca

    2015-01-01

    This concise set of course-based notes provides the reader with the main concepts and tools to perform statistical analysis of experimental data, in particular in the field of high-energy physics (HEP). First, an introduction to probability theory and basic statistics is given, mainly as reminder from advanced undergraduate studies, yet also in view to clearly distinguish the Frequentist versus Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on upper limits as many applications in HEP concern hypothesis testing, where often the main goal is to provide better and better limits so as to be able to distinguish eventually between competing hypotheses or to rule out some of them altogether. Many worked examples will help newcomers to the field and graduate students to understand the pitfalls in applying theoretical concepts to actual data

  17. METHODOLOGICAL PRINCIPLES AND METHODS OF TERMS OF TRADE STATISTICAL EVALUATION

    Directory of Open Access Journals (Sweden)

    N. Kovtun

    2014-09-01

    Full Text Available The paper studies the methodological principles and guidance of the statistical evaluation of terms of trade for the United Nations classification model – Harmonized Commodity Description and Coding System (HS. The practical implementation of the proposed three-stage model of index analysis and estimation of terms of trade for Ukraine's commodity-members for the period of 2011-2012 are realized.

  18. Estimation of Stochastic Volatility Models by Nonparametric Filtering

    DEFF Research Database (Denmark)

    Kanaya, Shin; Kristensen, Dennis

    2016-01-01

    /estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...

  19. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  20. Statistical Diagnosis of the Best Weibull Methods for Wind Power Assessment for Agricultural Applications

    OpenAIRE

    Abul Kalam Azad; Mohammad Golam Rasul; Talal Yusaf

    2014-01-01

    The best Weibull distribution methods for the assessment of wind energy potential at different altitudes in desired locations are statistically diagnosed in this study. Seven different methods, namely graphical method (GM), method of moments (MOM), standard deviation method (STDM), maximum likelihood method (MLM), power density method (PDM), modified maximum likelihood method (MMLM) and equivalent energy method (EEM) were used to estimate the Weibull parameters and six statistical tools, name...

  1. Bayesian nonparametric adaptive control using Gaussian processes.

    Science.gov (United States)

    Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A

    2015-03-01

    Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.

  2. Statistical methods for segmentation and classification of images

    DEFF Research Database (Denmark)

    Rosholm, Anders

    1997-01-01

    The central matter of the present thesis is Bayesian statistical inference applied to classification of images. An initial review of Markov Random Fields relates to the modeling aspect of the indicated main subject. In that connection, emphasis is put on the relatively unknown sub-class of Pickard...... with a Pickard Random Field modeling of a considered (categorical) image phenomemon. An extension of the fast PRF based classification technique is presented. The modification introduces auto-correlation into the model of an involved noise process, which previously has been assumed independent. The suitability...... of the extended model is documented by tests on controlled image data containing auto-correlated noise....

  3. Spatial Analysis Along Networks Statistical and Computational Methods

    CERN Document Server

    Okabe, Atsuyuki

    2012-01-01

    In the real world, there are numerous and various events that occur on and alongside networks, including the occurrence of traffic accidents on highways, the location of stores alongside roads, the incidence of crime on streets and the contamination along rivers. In order to carry out analyses of those events, the researcher needs to be familiar with a range of specific techniques. Spatial Analysis Along Networks provides a practical guide to the necessary statistical techniques and their computational implementation. Each chapter illustrates a specific technique, from Stochastic Point Process

  4. Convex Optimization Methods for Graphs and Statistical Modeling

    Science.gov (United States)

    2011-06-01

    requirements that the graph be triangle-free and square-free. Of course such graph reconstruction problems may be infeasible in general, as there may be...over C1, C2 is motivated by a similar procedure in statistics and signal processing, which goes by the name of “matched filtering.” Of course other...h is the height of the cap over the equator. Via elementary trigonometry , the solid angle that K subtends is given by π/2 − sin−1(h). Hence, if h(β

  5. Non-parametric star formation histories for 5 dwarf spheroidal galaxies of the local group

    CERN Document Server

    Hernández, X; Valls-Gabaud, D; Gilmore, Gerard; Valls-Gabaud, David

    2000-01-01

    We use recent HST colour-magnitude diagrams of the resolved stellar populations of a sample of local dSph galaxies (Carina, LeoI, LeoII, Ursa Minor and Draco) to infer the star formation histories of these systems, $SFR(t)$. Applying a new variational calculus maximum likelihood method which includes a full Bayesian analysis and allows a non-parametric estimate of the function one is solving for, we infer the star formation histories of the systems studied. This method has the advantage of yielding an objective answer, as one need not assume {\\it a priori} the form of the function one is trying to recover. The results are checked independently using Saha's $W$ statistic. The total luminosities of the systems are used to normalize the results into physical units and derive SN type II rates. We derive the luminosity weighted mean star formation history of this sample of galaxies.

  6. Methods in probability and statistical inference. Progress report, June 1975--June 14, 1976. [Dept. of Statistics, Univ. of Chicago

    Energy Technology Data Exchange (ETDEWEB)

    Perlman, M D

    1976-03-01

    Efficient methods for approximating percentage points of the largest characteristic root of a Wishart matrix, and other statistical quantities of interest, were developed. Fitting of non-additive models to two-way and higher-way tables and the further development of the SNAP statistical computing system were reported. Numerical procedures for computing boundary-crossing probabilities for Brownian motion and other stochastic processes, such as Bessel diffusions, were implemented. Mathematical techniques from statistical mechanics were applied to obtain a unified treatment of probabilities of large deviations of the sample; in the setting of general topological vector spaces. The application of the Martin boundary to questions about infinite particle systems was studied. A comparative study of classical ''omnibus'' and Bayes procedures for combining several independent noncentral chi-square test statistics was completed. Work proceeds on the related problem of combining noncentral F-tests. A numerical study of the small-sample powers of the Pearson chi-square and likelihood ratio tests for multinomial goodness-of-fit was made. The relationship between asymptotic (large sample) efficiency of test statistics, as measured by Bahadur's concept of exact slope, and actual small-sample efficiency was studied. A promising new technique for the simultaneous estimation of all correlation coefficients in a multivariate population was developed. The method adapts the James--Stein ''shrinking'' estimator (for location parameters) to the estimating of correlations.

  7. M&M's "The Method," and Other Ideas about Teaching Elementary Statistics.

    Science.gov (United States)

    May, E. Lee Jr.

    2000-01-01

    Consists of a collection of observations about the teaching of the first course in elementary probability and statistics offered by many colleges and universities. Highlights the Goldberg Method for solving problems in probability and statistics. (Author/ASK)

  8. Modification of codes NUALGAM and BREMRAD. Volume 3: Statistical considerations of the Monte Carlo method

    Science.gov (United States)

    Firstenberg, H.

    1971-01-01

    The statistics are considered of the Monte Carlo method relative to the interpretation of the NUGAM2 and NUGAM3 computer code results. A numerical experiment using the NUGAM2 code is presented and the results are statistically interpreted.

  9. Introducing Students to the Application of Statistics and Investigative Methods in Political Science

    Science.gov (United States)

    Wells, Dominic D.; Nemire, Nathan A.

    2017-01-01

    This exercise introduces students to the application of statistics and its investigative methods in political science. It helps students gain a better understanding and a greater appreciation of statistics through a real world application.

  10. [Diversity and frequency of scientific research design and statistical methods in the "Arquivos Brasileiros de Oftalmologia": a systematic review of the "Arquivos Brasileiros de Oftalmologia"--1993-2002].

    Science.gov (United States)

    Crosta, Fernando; Nishiwaki-Dantas, Maria Cristina; Silvino, Wilmar; Dantas, Paulo Elias Correa

    2005-01-01

    To verify the frequency of study design, applied statistical analysis and approval by institutional review offices (Ethics Committee) of articles published in the "Arquivos Brasileiros de Oftalmologia" during a 10-year interval, with later comparative and critical analysis by some of the main international journals in the field of Ophthalmology. Systematic review without metanalysis was performed. Scientific papers published in the "Arquivos Brasileiros de Oftalmologia" between January 1993 and December 2002 were reviewed by two independent reviewers and classified according to the applied study design, statistical analysis and approval by the institutional review offices. To categorize those variables, a descriptive statistical analysis was used. After applying inclusion and exclusion criteria, 584 articles for evaluation of statistical analysis and, 725 articles for evaluation of study design were reviewed. Contingency table (23.10%) was the most frequently applied statistical method, followed by non-parametric tests (18.19%), Student's t test (12.65%), central tendency measures (10.60%) and analysis of variance (9.81%). Of 584 reviewed articles, 291 (49.82%) presented no statistical analysis. Observational case series (26.48%) was the most frequently used type of study design, followed by interventional case series (18.48%), observational case description (13.37%), non-random clinical study (8.96%) and experimental study (8.55%). We found a higher frequency of observational clinical studies, lack of statistical analysis in almost half of the published papers. Increase in studies with approval by institutional review Ethics Committee was noted since it became mandatory in 1996.

  11. An Optimization Method for Simulator Using Probability Statistic Model

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    An optimization method was presented to be easily applied in retargetable simulator. The substance of this method is to reduce the redundant information of operation code which is caused by the variety of execution frequencies of instructions. By recoding the operation code in the loading part of simulator, times of bit comparison in identification of an instruction will get reduced. Thus the performance of the simulator will be improved. The theoretical analysis and experimental results both prove the validity of this method.

  12. Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

    Energy Technology Data Exchange (ETDEWEB)

    Fountain, Matthew S.; Brigantic, Robert T.; Peterson, Reid A.

    2013-10-01

    This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).

  13. Dragon-kings: mechanisms, statistical methods and empirical evidence

    CERN Document Server

    Sornette, D; 10.1140/epjst/e2012-01559-5

    2012-01-01

    This introductory article presents the special Discussion and Debate volume "From black swans to dragon-kings, is there life beyond power laws?" published in Eur. Phys. J. Special Topics in May 2012. We summarize and put in perspective the contributions into three main themes: (i) mechanisms for dragon-kings, (ii) detection of dragon-kings and statistical tests and (iii) empirical evidence in a large variety of natural and social systems. Overall, we are pleased to witness significant advances both in the introduction and clarification of underlying mechanisms and in the development of novel efficient tests that demonstrate clear evidence for the presence of dragon-kings in many systems. However, this positive view should be balanced by the fact that this remains a very delicate and difficult field, if only due to the scarcity of data as well as the extraordinary important implications with respect to hazard assessment, risk control and predictability.

  14. Analogue Correction Method of Errors by Combining Statistical and Dynamical Methods

    Institute of Scientific and Technical Information of China (English)

    REN Hongli; CHOU Jifan

    2006-01-01

    Based on the atmospheric analogy principle, the inverse problem that the information of historical analogue data is utilized to estimate model errors is put forward and a method of analogue correction of errors (ACE) of model is developed in this paper. The ACE can combine effectively statistical and dynamical methods, and need not change the current numerical prediction models. The new method not only adequately utilizes dynamical achievements but also can reasonably absorb the information of a great many analogues in historical data in order to reduce model errors and improve forecast skill.Furthermore, the ACE may identify specific historical data for the solution of the inverse problem in terms of the particularity of current forecast. The qualitative analyses show that the ACE is theoretically equivalent to the principle of the previous analogue-dynamical model, but need not rebuild the complicated analogue-deviation model, so has better feasibility and operational foreground. Moreover, under the ideal situations, when numerical models or historical analogues are perfect, the forecast of the ACE would transform into the forecast of dynamical or statistical method, respectively.

  15. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy

    Directory of Open Access Journals (Sweden)

    Archer Kellie J

    2008-02-01

    Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been

  16. A non-parametric meta-analysis approach for combining independent microarray datasets: application using two microarray datasets pertaining to chronic allograft nephropathy.

    Science.gov (United States)

    Kong, Xiangrong; Mas, Valeria; Archer, Kellie J

    2008-02-26

    With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the

  17. Right-Censored Nonparametric Regression: A Comparative Simulation Study

    Directory of Open Access Journals (Sweden)

    Dursun Aydın

    2016-11-01

    Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.

  18. Statistical methods for damage detection applied to civil structures

    DEFF Research Database (Denmark)

    Gres, Szymon; Ulriksen, Martin Dalgaard; Döhler, Michael

    2017-01-01

    of the two damage detection methods is similar, hereby implying merit of the new Mahalanobis distance-based approach, as it is less computational complex. The fusion of the damage indicators in the control chart provides the most accurate view on the progressively damaged systems....... and compared to the well-known subspace-based damage detection algorithm in the context of two large case studies. Both methods are implemented in the modal analysis and structural health monitoring software ARTeMIS, in which the joint features of the methods are concluded in a control chart in an attempt...

  19. Statistics in science the foundations of statistical methods in biology, physics and economics

    CERN Document Server

    Costantini, Domenico

    1990-01-01

    An inference may be defined as a passage of thought according to some method. In the theory of knowledge it is customary to distinguish deductive and non-deductive inferences. Deductive inferences are truth preserving, that is, the truth of the premises is preserved in the con­ clusion. As a result, the conclusion of a deductive inference is already 'contained' in the premises, although we may not know this fact until the inference is performed. Standard examples of deductive inferences are taken from logic and mathematics. Non-deductive inferences need not preserve truth, that is, 'thought may pass' from true premises to false conclusions. Such inferences can be expansive, or, ampliative in the sense that the performances of such inferences actually increases our putative knowledge. Standard non-deductive inferences do not really exist, but one may think of elementary inductive inferences in which conclusions regarding the future are drawn from knowledge of the past. Since the body of scientific knowledge i...

  20. GROUNDWATER MONITORING: Statistical Methods for Testing Special Background Conditions

    Energy Technology Data Exchange (ETDEWEB)

    Chou, Charissa J.

    2004-04-28

    This chapter illustrates application of a powerful intra-well testing method referred as the combined Shewhart-CUSUM control chart approach, which can detect abrupt and gradual changes in groundwater parameter concentrations. This method is broadly applicable to groundwater monitoring situations where there is no clearly defined upgradient well or wells, where spatial variability exists in parameter concentrations, or when groundwater flow rate is extremely slow. Procedures for determining the minimum time needed to acquire independent groundwater samples and useful transformations for obtaining normally distributed data are also provided. The control chart method will be insensitive to detect real changes if a preexisting trend is observed in the background data set. A method and a case study describing how a trend observed in a background data set can be removed using a transformation suggested by Gibbons (1994) are presented to illustrate treatment of a preexisting trend.

  1. Climate time series analysis classical statistical and bootstrap methods

    CERN Document Server

    Mudelsee, Manfred

    2014-01-01

    Written for climatologists and applied statisticians, this book explains the bootstrap algorithms (including novel adaptions) and methods for confidence interval construction. The accuracy of the algorithms is tested by means of Monte Carlo experiments.

  2. Statistical evaluation of texture analysis from the biocrystallization method

    OpenAIRE

    Meelursarn, Aumaporn

    2007-01-01

    The consumers are becoming more concerned about food quality, especially regarding how, when and where the foods are produced (Haglund et al., 1999; Kahl et al., 2004; Alföldi, et al., 2006). Therefore, during recent years there has been a growing interest in the methods for food quality assessment, especially in the picture-development methods as a complement to traditional chemical analysis of single compounds (Kahl et al., 2006). The biocrystallization as one of the picture-developin...

  3. Statistical Methods for Predicting Malaria Incidences Using Data from Sudan

    Science.gov (United States)

    Awadalla, Khidir E.

    2017-01-01

    Malaria is the leading cause of illness and death in Sudan. The entire population is at risk of malaria epidemics with a very high burden on government and population. The usefulness of forecasting methods in predicting the number of future incidences is needed to motivate the development of a system that can predict future incidences. The objective of this paper is to develop applicable and understood time series models and to find out what method can provide better performance to predict future incidences level. We used monthly incidence data collected from five states in Sudan with unstable malaria transmission. We test four methods of the forecast: (1) autoregressive integrated moving average (ARIMA); (2) exponential smoothing; (3) transformation model; and (4) moving average. The result showed that transformation method performed significantly better than the other methods for Gadaref, Gazira, North Kordofan, and Northern, while the moving average model performed significantly better for Khartoum. Future research should combine a number of different and dissimilar methods of time series to improve forecast accuracy with the ultimate aim of developing a simple and useful model for producing reasonably reliable forecasts of the malaria incidence in the study area.

  4. Data Analysis & Statistical Methods for Command File Errors

    Science.gov (United States)

    Meshkat, Leila; Waggoner, Bruce; Bryant, Larry

    2014-01-01

    This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.

  5. Non-Statistical Methods of Analysing of Bankruptcy Risk

    Directory of Open Access Journals (Sweden)

    Pisula Tomasz

    2015-06-01

    Full Text Available The article focuses on assessing the effectiveness of a non-statistical approach to bankruptcy modelling in enterprises operating in the logistics sector. In order to describe the issue more comprehensively, the aforementioned prediction of the possible negative results of business operations was carried out for companies functioning in the Polish region of Podkarpacie, and in Slovakia. The bankruptcy predictors selected for the assessment of companies operating in the logistics sector included 28 financial indicators characterizing these enterprises in terms of their financial standing and management effectiveness. The purpose of the study was to identify factors (models describing the bankruptcy risk in enterprises in the context of their forecasting effectiveness in a one-year and two-year time horizon. In order to assess their practical applicability the models were carefully analysed and validated. The usefulness of the models was assessed in terms of their classification properties, and the capacity to accurately identify enterprises at risk of bankruptcy and healthy companies as well as proper calibration of the models to the data from training sample sets.

  6. Comparison of Statistical Methods for Detector Testing Programs

    Energy Technology Data Exchange (ETDEWEB)

    Rennie, John Alan [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Abhold, Mark [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-10-14

    A typical goal for any detector testing program is to ascertain not only the performance of the detector systems under test, but also the confidence that systems accepted using that testing program’s acceptance criteria will exceed a minimum acceptable performance (which is usually expressed as the minimum acceptable success probability, p). A similar problem often arises in statistics, where we would like to ascertain the fraction, p, of a population of items that possess a property that may take one of two possible values. Typically, the problem is approached by drawing a fixed sample of size n, with the number of items out of n that possess the desired property, x, being termed successes. The sample mean gives an estimate of the population mean p ≈ x/n, although usually it is desirable to accompany such an estimate with a statement concerning the range within which p may fall and the confidence associated with that range. Procedures for establishing such ranges and confidence limits are described in detail by Clopper, Brown, and Agresti for two-sided symmetric confidence intervals.

  7. An improved Bayesian matting method based on image statistic characteristics

    Science.gov (United States)

    Sun, Wei; Luo, Siwei; Wu, Lina

    2015-03-01

    Image matting is an important task in image and video editing and has been studied for more than 30 years. In this paper we propose an improved interactive matting method. Starting from a coarse user-guided trimap, we first perform a color estimation based on texture and color information and use the result to refine the original trimap. Then with the new trimap, we apply soft matting process which is improved Bayesian matting with smoothness constraints. Experimental results on natural image show that this method is useful, especially for the images have similar texture feature in the background or the images which is hard to give a precise trimap.

  8. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    Science.gov (United States)

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.

  9. Modelación de episodios críticos de contaminación por material particulado (PM10 en Santiago de Chile: Comparación de la eficiencia predictiva de los modelos paramétricos y no paramétricos Modeling critical episodes of air pollution by PM10 in Santiago, Chile: Comparison of the predictive efficiency of parametric and non-parametric statistical models

    Directory of Open Access Journals (Sweden)

    Sergio A. Alvarado

    2010-12-01

    Full Text Available Objetivo: Evaluar la eficiencia predictiva de modelos estadísticos paramétricos y no paramétricos para predecir episodios críticos de contaminación por material particulado PM10 del día siguiente, que superen en Santiago de Chile la norma de calidad diaria. Una predicción adecuada de tales episodios permite a la autoridad decretar medidas restrictivas que aminoren la gravedad del episodio, y consecuentemente proteger la salud de la comunidad. Método: Se trabajó con las concentraciones de material particulado PM10 registradas en una estación asociada a la red de monitorización de la calidad del aire MACAM-2, considerando 152 observaciones diarias de 14 variables, y con información meteorológica registrada durante los años 2001 a 2004. Se ajustaron modelos estadísticos paramétricos Gamma usando el paquete estadístico STATA v11, y no paramétricos usando una demo del software estadístico MARS v 2.0 distribuida por Salford-Systems. Resultados: Ambos métodos de modelación presentan una alta correlación entre los valores observados y los predichos. Los modelos Gamma presentan mejores aciertos que MARS para las concentraciones de PM10 con valores Objective: To evaluate the predictive efficiency of two statistical models (one parametric and the other non-parametric to predict critical episodes of air pollution exceeding daily air quality standards in Santiago, Chile by using the next day PM10 maximum 24h value. Accurate prediction of such episodes would allow restrictive measures to be applied by health authorities to reduce their seriousness and protect the community´s health. Methods: We used the PM10 concentrations registered by a station of the Air Quality Monitoring Network (152 daily observations of 14 variables and meteorological information gathered from 2001 to 2004. To construct predictive models, we fitted a parametric Gamma model using STATA v11 software and a non-parametric MARS model by using a demo version of Salford

  10. Statistical methods in interphase cytogenetics: an experimental approach.

    Science.gov (United States)

    Kibbelaar, R E; Kok, F; Dreef, E J; Kleiverda, J K; Cornelisse, C J; Raap, A K; Kluin, P M

    1993-10-01

    In situ hybridization (ISH) techniques on interphase cells, or interphase cytogenetics, have powerful potential clinical and biological applications, such as detection of minimal residual disease, early relapse, and the study of clonal evolution and expansion in neoplasia. Much attention has been paid to issues related to ISH data acquisition, i.e., the numbers, colors, intensities, and spatial relationships of hybridization signals. The methodology concerning data analysis, which is of prime importance for clinical applications, however, is less well investigated. We have studied the latter for the detection of small monosomic and trisomic cell populations using various mixtures of human female and male cells. With a chromosome X specific probe, the male cells stimulated monosomic subpopulations of 0, 1, 5, 10, 50, 90, 95, 99, and 100%. Analogously, when a (7 + Y) specific probe combination was used, containing a mixture of chromosome No. 7 and Y-specific DNA, the male cells simulated trisomic cell populations. Probes specific for chromosomes Nos. 1, 7, 8, and 9 were used for estimation of ISH artifacts. Three statistical tests, the Kolmogorov-Smirnov test, the multiple-proportion test, and the z'-max test, were applied to the empirical data using the control data as a reference for ISH artifacts. The Kolmogorov-Smirnov test was found to be inferior for discrimination of small monosomic or trisomic cell populations. The other two tests showed that when 400 cells were evaluated, and using selected control probes, monosomy X could be detected at a frequency of 5% aberrant cells, and trisomy 7 + Y at a frequency of 1%.(ABSTRACT TRUNCATED AT 250 WORDS)

  11. Nonparametric correlation models for portfolio allocation

    DEFF Research Database (Denmark)

    Aslanidis, Nektarios; Casas, Isabel

    2013-01-01

    breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....

  12. CAPABILITY ASSESSMENT OF MEASURING EQUIPMENT USING STATISTIC METHOD

    Directory of Open Access Journals (Sweden)

    Pavel POLÁK

    2014-10-01

    Full Text Available Capability assessment of the measurement device is one of the methods of process quality control. Only in case the measurement device is capable, the capability of the measurement and consequently production process can be assessed. This paper deals with assessment of the capability of the measuring device using indices Cg and Cgk.

  13. Statistical tests for equal predictive ability across multiple forecasting methods

    DEFF Research Database (Denmark)

    Borup, Daniel; Thyrsgaard, Martin

    as non-stationarity of the data. We introduce two finite-sample corrections, leading to good size and power properties. We also provide a two-step Model Confidence Set-type decision rule for ranking the forecasting methods into sets of indistinguishable conditional predictive ability, particularly...

  14. Further Research into a Non-Parametric Statistical Screening System.

    Science.gov (United States)

    1979-12-14

    Let X = V if birth weight is high X2 = 0 if gestation length is short V2 if gestation length is long Normal babies have high birth weight and long... gestation length or low birth weight and short gestation length . Abnormal babies have either of the other two combinations ((0, 1) or (1, 0)). The LDF

  15. A Statistical, Nonparametric Methodology for Document Degradation Model Validation

    Science.gov (United States)

    1999-01-01

    j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno &;c’)%m�wb%jl%V_=i"jl_t� nvj,�Ac*,d’&)w4j!*,’j!�¢�7d*j,*�%nv&;no&;c£p8mq%_,_�n¥¤)i...dmvmoiwi"�(^)i"*,nv�Ui"&(jl_�j,d _�j!’)w4+j,�bi¡h;*!i%r(w4d� &|^)dnv&4j�_ad� j,�biU_=+;_=j,i"�V_��p8*!i%j!i¡m�%*,ci¡�L’mvj,no mvno ...34 mvno �;dd(w°�7’&Rp8j,nvd&-d� j!�;i¦w;%j�%4�U© jj!’*,&R_ d’j�j!�)%j\\j!�;nq_\\w4n�_�jl%&)p8i��7’&Rp8j,nvd&-nq_tj!�;iU_!%�UiL%_ j!�;i�d&;i

  16. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models

    NARCIS (Netherlands)

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.

    2012-01-01

    PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator

  17. a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows

    Science.gov (United States)

    Molina, J. M.; Ramirez, J. A.; Raff, D. A.

    2011-12-01

    A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both

  18. Students' Attitudes toward Statistics across the Disciplines: A Mixed-Methods Approach

    Science.gov (United States)

    Griffith, James D.; Adams, Lea T.; Gu, Lucy L.; Hart, Christian L.; Nichols-Whitehead, Penney

    2012-01-01

    Students' attitudes toward statistics were investigated using a mixed-methods approach including a discovery-oriented qualitative methodology among 684 undergraduate students across business, criminal justice, and psychology majors where at least one course in statistics was required. Students were asked about their attitudes toward statistics and…

  19. Counting Better? An Examination of the Impact of Quantitative Method Teaching on Statistical Anxiety and Confidence

    Science.gov (United States)

    Chamberlain, John Martyn; Hillier, John; Signoretta, Paola

    2015-01-01

    This article reports the results of research concerned with students' statistical anxiety and confidence to both complete and learn to complete statistical tasks. Data were collected at the beginning and end of a quantitative method statistics module. Students recognised the value of numeracy skills but felt they were not necessarily relevant for…

  20. Students' Attitudes toward Statistics across the Disciplines: A Mixed-Methods Approach

    Science.gov (United States)

    Griffith, James D.; Adams, Lea T.; Gu, Lucy L.; Hart, Christian L.; Nichols-Whitehead, Penney

    2012-01-01

    Students' attitudes toward statistics were investigated using a mixed-methods approach including a discovery-oriented qualitative methodology among 684 undergraduate students across business, criminal justice, and psychology majors where at least one course in statistics was required. Students were asked about their attitudes toward statistics and…

  1. Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models

    CERN Document Server

    Fan, Jianqing; Song, Rui

    2011-01-01

    A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...

  2. Correlated Non-Parametric Latent Feature Models

    CERN Document Server

    Doshi-Velez, Finale

    2012-01-01

    We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.

  3. A Censored Nonparametric Software Reliability Model

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.

  4. Nonparametric correlation models for portfolio allocation

    DEFF Research Database (Denmark)

    Aslanidis, Nektarios; Casas, Isabel

    2013-01-01

    This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...

  5. Statistical inference methods for two crossing survival curves: a comparison of methods.

    Science.gov (United States)

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.

  6. Experimental Data Mining Techniques(Using Multiple Statistical Methods

    Directory of Open Access Journals (Sweden)

    Mustafa Zaidi

    2012-05-01

    Full Text Available This paper discusses the possible solutions of non-linear multivariable by experimental Data mining techniques using on orthogonal array. Taguchi method is a very useful technique to reduce the time and cost of the experiment but the ignoring all kind of interaction effects. The results are not much encouraging and motivate to study Laser cutting process of non-linear multivariable is modeled by one and two way analysis of variance also linear and non linear regression analysis. These techniques are used to explore better analysis techniques and improve the laser cutting quality by reducing process variations caused by controllable process parameters. The size of data set causes difficulties in modeling and simulation of the problem such as decision tree is useful technique but it is not able to predict better results. The results of analysis of variance are encouraging. Taguchi and regression normally optimizes input process parameters for single characteristics.

  7. Refining developmental coordination disorder subtyping with multivariate statistical methods

    Directory of Open Access Journals (Sweden)

    Lalanne Christophe

    2012-07-01

    Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and

  8. Predicting sulphur and nitrogen deposition using a simple statistical method

    Science.gov (United States)

    Oulehle, Filip; Kopáček, Jiří; Chuman, Tomáš; Černohous, Vladimír; Hůnová, Iva; Hruška, Jakub; Krám, Pavel; Lachmanová, Zora; Navrátil, Tomáš; Štěpánek, Petr; Tesař, Miroslav; Evans, Christopher D.

    2016-09-01

    Data from 32 long-term (1994-2012) monitoring sites were used to assess temporal development and spatial variability of sulphur (S) and inorganic nitrogen (N) concentrations in bulk precipitation, and S in throughfall, for the Czech Republic. Despite large variance in absolute S and N concentration/deposition among sites, temporal coherence using standardised data (Z score) was demonstrated. Overall significant declines of SO4 concentration in bulk and throughfall precipitation, as well as NO3 and NH4 concentration in bulk precipitation, were observed. Median Z score values of bulk SO4, NO3 and NH4 and throughfall SO4 derived from observations and the respective emission rates of SO2, NOx and NH3 in the Czech Republic and Slovakia showed highly significant (p Z score values were calculated for the whole period 1900-2012 and then back-transformed to give estimates of concentration for the individual sites. Uncertainty associated with the concentration calculations was estimated as 20% for SO4 bulk precipitation, 22% for throughfall SO4, 18% for bulk NO3 and 28% for bulk NH4. The application of the method suggested that it is effective in the long-term reconstruction and prediction of S and N deposition at a variety of sites. Multiple regression modelling was used to extrapolate site characteristics (mean precipitation chemistry and its standard deviation) from monitored to unmonitored sites. Spatially distributed temporal development of S and N depositions were calculated since 1900. The method allows spatio-temporal estimation of the acid deposition in regions with extensive monitoring of precipitation chemistry.

  9. Statistical methods used in the public health literature and implications for training of public health professionals.

    Science.gov (United States)

    Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.

  10. Testing the rate isomorphy hypothesis using five statistical methods

    Institute of Scientific and Technical Information of China (English)

    Xian-Ju Kuang; Megha N. Parajulee2+,; Pei-Jian Shi; Feng Ge; Fang-Sen Xue

    2012-01-01

    Organisms are said to be in developmental rate isomorphy when the proportions of developmental stage durations are unaffected by temperature.Comprehensive stage-specific developmental data were generated on the cabbage beetle,Colaphellus bowringi Baly (Coleoptera:Chrysomelidae),at eight temperatures ranging from 16℃ to 30℃ (in 2℃ increments) and five analytical methods were used to test the rate isomorphy hypothesis,including:(i) direct comparison of lower developmental thresholds with standard errors based on the traditional linear equation describing developmental rate as the linear function of temperature; (ii) analysis of covariance to compare the lower developmental thresholds of different stages based on the Ikemoto-Takai linear equation; (iii)testing the significance of the slope item in the regression line of arcsin(√P) versus temperature,where p is the ratio of the developmental duration of a particular developmental stage to the entire pre-imaginal developmental duration for one insect or mite species; (iv)analysis of variance to test for significant differences between the ratios of developmental stage durations to that of pre-imaginal development; and (v) checking whether there is an element less than a given level of significance in the p-value matrix of rotating regression line.The results revealed no significant difference among the lower developmental thresholds or among the aforementioned ratios,and thus convincingly confirmed the rate isomorphy hypothesis.

  11. Statistical methods for the forensic analysis of striated tool marks

    Energy Technology Data Exchange (ETDEWEB)

    Hoeksema, Amy Beth [Iowa State Univ., Ames, IA (United States)

    2013-01-01

    In forensics, fingerprints can be used to uniquely identify suspects in a crime. Similarly, a tool mark left at a crime scene can be used to identify the tool that was used. However, the current practice of identifying matching tool marks involves visual inspection of marks by forensic experts which can be a very subjective process. As a result, declared matches are often successfully challenged in court, so law enforcement agencies are particularly interested in encouraging research in more objective approaches. Our analysis is based on comparisons of profilometry data, essentially depth contours of a tool mark surface taken along a linear path. In current practice, for stronger support of a match or non-match, multiple marks are made in the lab under the same conditions by the suspect tool. We propose the use of a likelihood ratio test to analyze the difference between a sample of comparisons of lab tool marks to a field tool mark, against a sample of comparisons of two lab tool marks. Chumbley et al. (2010) point out that the angle of incidence between the tool and the marked surface can have a substantial impact on the tool mark and on the effectiveness of both manual and algorithmic matching procedures. To better address this problem, we describe how the analysis can be enhanced to model the effect of tool angle and allow for angle estimation for a tool mark left at a crime scene. With sufficient development, such methods may lead to more defensible forensic analyses.

  12. Non-parametric frequency analysis of extreme values for integrated disaster management considering probable maximum events

    Science.gov (United States)

    Takara, K. T.

    2015-12-01

    This paper describes a non-parametric frequency analysis method for hydrological extreme-value samples with a size larger than 100, verifying the estimation accuracy with a computer intensive statistics (CIS) resampling such as the bootstrap. Probable maximum values are also incorporated into the analysis for extreme events larger than a design level of flood control. Traditional parametric frequency analysis methods of extreme values include the following steps: Step 1: Collecting and checking extreme-value data; Step 2: Enumerating probability distributions that would be fitted well to the data; Step 3: Parameter estimation; Step 4: Testing goodness of fit; Step 5: Checking the variability of quantile (T-year event) estimates by the jackknife resampling method; and Step_6: Selection of the best distribution (final model). The non-parametric method (NPM) proposed here can skip Steps 2, 3, 4 and 6. Comparing traditional parameter methods (PM) with the NPM, this paper shows that PM often underestimates 100-year quantiles for annual maximum rainfall samples with records of more than 100 years. Overestimation examples are also demonstrated. The bootstrap resampling can do bias correction for the NPM and can also give the estimation accuracy as the bootstrap standard error. This NPM has advantages to avoid various difficulties in above-mentioned steps in the traditional PM. Probable maximum events are also incorporated into the NPM as an upper bound of the hydrological variable. Probable maximum precipitation (PMP) and probable maximum flood (PMF) can be a new parameter value combined with the NPM. An idea how to incorporate these values into frequency analysis is proposed for better management of disasters that exceed the design level. The idea stimulates more integrated approach by geoscientists and statisticians as well as encourages practitioners to consider the worst cases of disasters in their disaster management planning and practices.

  13. A new method of studying the statistical properties of speckle phase

    Institute of Scientific and Technical Information of China (English)

    Qiankai Wang

    2009-01-01

    A new theoretical method with generality is proposed to study the statistical properties of the speckle phase. The general expression of the standard deviation of the speckle phase about the first-order statistics is derived according to the relation between the phase and the complex speckle amplitude. The statistical properties of the speckle phase have been studied in the diffraction fields with this new theoretical method.

  14. Axial electron channeling statistical method of site occupancy determination

    Institute of Scientific and Technical Information of China (English)

    YE; Jia

    2001-01-01

    [1]Johnson, W., Sowerby, R., Venter, R. D., Plane Strain Slip Line Fields for Metal Deformation Processes——A Source Book and Bibliography, New York: Pergamon Press, 1982.[2]Hill, R., The Mathematical Theory of Plasticity, Oxford: Oxford University Press, 1950.[3]Sokolovsky, V. V., Theory of Plasticity(in Russia), Moskow: Nat. Tech. Press, 1950.[4]Kachanov, L. M., Foundations Theory of Plasticity, London: North-Holland, 1975.[5]Shield, R. T., On the plastic flow of metal condition of axial symmetry, Proc. Roy. Soc., 1955, 233A: 267.[6]Lippmann, H., IUTAM Symposium on Metal Forming Plasticity, New York: Springer-Verlag, 1979.[7]Spencer, A. J. M., The approximate solution of certain problem of axially-symmetric plastic flow, J. Mech. Phys. Solids, 1964, 12: 231.[8]Wang, R., Xiong, Z. H., Wang, W. B., Foundation of Plasticity (in Chinese), Beijing: Science Press, 1982.[9]Collins, I. E., Dewhurst, P., A slip line field analysis of asymmetrical hot rolling, International Journal of Mechanical Science, 1975, 17: 643.[10]Collins, I. F., Slip line field analysis of forming processes in plane strain and axial symmetry, Advanced Technology of Plasticity, 1984, 11: 1074.[11]Yu, M. H., Yang, S. Y., Liu, C. Y. et al., Unified plane-strain slip line field theory system, J. Civil Engineering (in Chinese), 1997, 30(2): 14[12]Simmons, J. A., Hauser, F., Dorn, E., Mathematical Theories of Plastic Deformation Under Impulsive Loading, Berkeley-Los Angeles: University of California Press, 1962.[13]Lin, C. C., On a perturbation theory based on the method of characteristies, J. Math. Phys., 1954, 33: 117—134.[14]Hopkins, H. G., The method of characteristics and its applications to the theory of stress waver in solids, in Engineering Plasticity, Combridge: Combridge University Press, 1968, 277—315.[15]Shield, R. T., The plastic indentation of a layer by a flat punch, Quart. Appl. Math., 1955, 13: 27.[16]Haar, A., von

  15. A Bayesian Nonparametric Approach to Test Equating

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2009-01-01

    A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…

  16. How Are Teachers Teaching? A Nonparametric Approach

    Science.gov (United States)

    De Witte, Kristof; Van Klaveren, Chris

    2014-01-01

    This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…

  17. Decompounding random sums: A nonparametric approach

    DEFF Research Database (Denmark)

    Hansen, Martin Bøgsted; Pitts, Susan M.

    review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...

  18. A Nonparametric Analogy of Analysis of Covariance

    Science.gov (United States)

    Burnett, Thomas D.; Barr, Donald R.

    1977-01-01

    A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)

  19. Panel data specifications in nonparametric kernel regression

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...

  20. How Are Teachers Teaching? A Nonparametric Approach

    Science.gov (United States)

    De Witte, Kristof; Van Klaveren, Chris

    2014-01-01

    This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…