WorldWideScience

Sample records for nonparametric regression techniques

  1. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin

    2017-01-19

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  2. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

    2017-01-01

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  3. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  4. Panel data specifications in nonparametric kernel regression

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...

  5. Nonparametric regression using the concept of minimum energy

    International Nuclear Information System (INIS)

    Williams, Mike

    2011-01-01

    It has recently been shown that an unbinned distance-based statistic, the energy, can be used to construct an extremely powerful nonparametric multivariate two sample goodness-of-fit test. An extension to this method that makes it possible to perform nonparametric regression using multiple multivariate data sets is presented in this paper. The technique, which is based on the concept of minimizing the energy of the system, permits determination of parameters of interest without the need for parametric expressions of the parent distributions of the data sets. The application and performance of this new method is discussed in the context of some simple example analyses.

  6. Nonparametric additive regression for repeatedly measured data

    KAUST Repository

    Carroll, R. J.

    2009-05-20

    We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. © 2009 Biometrika Trust.

  7. Comparing parametric and nonparametric regression methods for panel data

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...

  8. Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan; Zvárová, Jana

    2017-01-01

    Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability

  9. On the robust nonparametric regression estimation for a functional regressor

    OpenAIRE

    Azzedine , Nadjia; Laksaci , Ali; Ould-Saïd , Elias

    2009-01-01

    On the robust nonparametric regression estimation for a functional regressor correspondance: Corresponding author. (Ould-Said, Elias) (Azzedine, Nadjia) (Laksaci, Ali) (Ould-Said, Elias) Departement de Mathematiques--> , Univ. Djillali Liabes--> , BP 89--> , 22000 Sidi Bel Abbes--> - ALGERIA (Azzedine, Nadjia) Departement de Mathema...

  10. Nonparametric instrumental regression with non-convex constraints

    International Nuclear Information System (INIS)

    Grasmair, M; Scherzer, O; Vanhems, A

    2013-01-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition. (paper)

  11. Nonparametric instrumental regression with non-convex constraints

    Science.gov (United States)

    Grasmair, M.; Scherzer, O.; Vanhems, A.

    2013-03-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  12. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  13. Multivariate nonparametric regression and visualization with R and applications to finance

    CERN Document Server

    Klemelä, Jussi

    2014-01-01

    A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio

  14. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  15. Testing for constant nonparametric effects in general semiparametric regression models with interactions

    KAUST Repository

    Wei, Jiawei; Carroll, Raymond J.; Maity, Arnab

    2011-01-01

    We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work

  16. CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

    Science.gov (United States)

    Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.

  17. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...

  18. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  19. REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

    Science.gov (United States)

    Rock, N. M. S.; Duffy, T. R.

    REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.

  20. Testing the equality of nonparametric regression curves based on ...

    African Journals Online (AJOL)

    Abstract. In this work we propose a new methodology for the comparison of two regression functions f1 and f2 in the case of homoscedastic error structure and a fixed design. Our approach is based on the empirical Fourier coefficients of the regression functions f1 and f2 respectively. As our main results we obtain the ...

  1. Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three

    Science.gov (United States)

    Steinhardt, Charles L.; Jermyn, Adam S.

    2018-02-01

    Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.

  2. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  3. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  4. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  5. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error

    KAUST Repository

    Carroll, Raymond J.

    2011-03-01

    In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.

  6. Nonparametric Regression with Subfractional Brownian Motion via Malliavin Calculus

    Directory of Open Access Journals (Sweden)

    Yuquan Cang

    2014-01-01

    Full Text Available We study the asymptotic behavior of the sequence Sn=∑i=0n-1K(nαSiH1(Si+1H2-SiH2, as n tends to infinity, where SH1 and SH2 are two independent subfractional Brownian motions with indices H1 and H2, respectively. K is a kernel function and the bandwidth parameter α satisfies some hypotheses in terms of H1 and H2. Its limiting distribution is a mixed normal law involving the local time of the sub-fractional Brownian motion SH1. We mainly use the techniques of Malliavin calculus with respect to sub-fractional Brownian motion.

  7. Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

    Directory of Open Access Journals (Sweden)

    Xibin Zhang

    2016-04-01

    Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.

  8. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

    KAUST Repository

    Dai, Wenlin; Tong, Tiejun; Zhu, Lixing

    2017-01-01

    Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

  9. On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression

    KAUST Repository

    Dai, Wenlin

    2017-09-01

    Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.

  10. Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection

    Science.gov (United States)

    Kumar, Sricharan; Srivistava, Ashok N.

    2012-01-01

    Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.

  11. Scalable Bayesian nonparametric regression via a Plackett-Luce model for conditional ranks

    Science.gov (United States)

    Gray-Davies, Tristan; Holmes, Chris C.; Caron, François

    2018-01-01

    We present a novel Bayesian nonparametric regression model for covariates X and continuous response variable Y ∈ ℝ. The model is parametrized in terms of marginal distributions for Y and X and a regression function which tunes the stochastic ordering of the conditional distributions F (y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates. PMID:29623150

  12. Bootstrap-based procedures for inference in nonparametric receiver-operating characteristic curve regression analysis.

    Science.gov (United States)

    Rodríguez-Álvarez, María Xosé; Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G

    2018-03-01

    Prior to using a diagnostic test in a routine clinical setting, the rigorous evaluation of its diagnostic accuracy is essential. The receiver-operating characteristic curve is the measure of accuracy most widely used for continuous diagnostic tests. However, the possible impact of extra information about the patient (or even the environment) on diagnostic accuracy also needs to be assessed. In this paper, we focus on an estimator for the covariate-specific receiver-operating characteristic curve based on direct regression modelling and nonparametric smoothing techniques. This approach defines the class of generalised additive models for the receiver-operating characteristic curve. The main aim of the paper is to offer new inferential procedures for testing the effect of covariates on the conditional receiver-operating characteristic curve within the above-mentioned class. Specifically, two different bootstrap-based tests are suggested to check (a) the possible effect of continuous covariates on the receiver-operating characteristic curve and (b) the presence of factor-by-curve interaction terms. The validity of the proposed bootstrap-based procedures is supported by simulations. To facilitate the application of these new procedures in practice, an R-package, known as npROCRegression, is provided and briefly described. Finally, data derived from a computer-aided diagnostic system for the automatic detection of tumour masses in breast cancer is analysed.

  13. Testing for constant nonparametric effects in general semiparametric regression models with interactions

    KAUST Repository

    Wei, Jiawei

    2011-07-01

    We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.

  14. Categorical and nonparametric data analysis choosing the best statistical technique

    CERN Document Server

    Nussbaum, E Michael

    2014-01-01

    Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain

  15. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)

  16. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  17. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  18. A ¤nonparametric dynamic additive regression model for longitudinal data

    DEFF Research Database (Denmark)

    Martinussen, T.; Scheike, T. H.

    2000-01-01

    dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...

  19. Subpixel Snow Cover Mapping from MODIS Data by Nonparametric Regression Splines

    Science.gov (United States)

    Akyurek, Z.; Kuter, S.; Weber, G. W.

    2016-12-01

    Spatial extent of snow cover is often considered as one of the key parameters in climatological, hydrological and ecological modeling due to its energy storage, high reflectance in the visible and NIR regions of the electromagnetic spectrum, significant heat capacity and insulating properties. A significant challenge in snow mapping by remote sensing (RS) is the trade-off between the temporal and spatial resolution of satellite imageries. In order to tackle this issue, machine learning-based subpixel snow mapping methods, like Artificial Neural Networks (ANNs), from low or moderate resolution images have been proposed. Multivariate Adaptive Regression Splines (MARS) is a nonparametric regression tool that can build flexible models for high dimensional and complex nonlinear data. Although MARS is not often employed in RS, it has various successful implementations such as estimation of vertical total electron content in ionosphere, atmospheric correction and classification of satellite images. This study is the first attempt in RS to evaluate the applicability of MARS for subpixel snow cover mapping from MODIS data. Total 16 MODIS-Landsat ETM+ image pairs taken over European Alps between March 2000 and April 2003 were used in the study. MODIS top-of-atmospheric reflectance, NDSI, NDVI and land cover classes were used as predictor variables. Cloud-covered, cloud shadow, water and bad-quality pixels were excluded from further analysis by a spatial mask. MARS models were trained and validated by using reference fractional snow cover (FSC) maps generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also developed. The mutual comparison of obtained MARS and ANN models was accomplished on independent test areas. The MARS model performed better than the ANN model with an average RMSE of 0.1288 over the independent test areas; whereas the average RMSE of the ANN model

  20. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    Science.gov (United States)

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected

  1. Dependence between fusion temperatures and chemical components of a certain type of coal using classical, non-parametric and bootstrap techniques

    Energy Technology Data Exchange (ETDEWEB)

    Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)

    1990-11-01

    A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.

  2. Does the high–tech industry consistently reduce CO{sub 2} emissions? Results from nonparametric additive regression model

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Bin [School of Statistics, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013 (China); Research Center of Applied Statistics, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013 (China); Lin, Boqiang, E-mail: bqlin@xmu.edu.cn [Collaborative Innovation Center for Energy Economics and Energy Policy, China Institute for Studies in Energy Policy, Xiamen University, Xiamen, Fujian 361005 (China)

    2017-03-15

    China is currently the world's largest carbon dioxide (CO{sub 2}) emitter. Moreover, total energy consumption and CO{sub 2} emissions in China will continue to increase due to the rapid growth of industrialization and urbanization. Therefore, vigorously developing the high–tech industry becomes an inevitable choice to reduce CO{sub 2} emissions at the moment or in the future. However, ignoring the existing nonlinear links between economic variables, most scholars use traditional linear models to explore the impact of the high–tech industry on CO{sub 2} emissions from an aggregate perspective. Few studies have focused on nonlinear relationships and regional differences in China. Based on panel data of 1998–2014, this study uses the nonparametric additive regression model to explore the nonlinear effect of the high–tech industry from a regional perspective. The estimated results show that the residual sum of squares (SSR) of the nonparametric additive regression model in the eastern, central and western regions are 0.693, 0.054 and 0.085 respectively, which are much less those that of the traditional linear regression model (3.158, 4.227 and 7.196). This verifies that the nonparametric additive regression model has a better fitting effect. Specifically, the high–tech industry produces an inverted “U–shaped” nonlinear impact on CO{sub 2} emissions in the eastern region, but a positive “U–shaped” nonlinear effect in the central and western regions. Therefore, the nonlinear impact of the high–tech industry on CO{sub 2} emissions in the three regions should be given adequate attention in developing effective abatement policies. - Highlights: • The nonlinear effect of the high–tech industry on CO{sub 2} emissions was investigated. • The high–tech industry yields an inverted “U–shaped” effect in the eastern region. • The high–tech industry has a positive “U–shaped” nonlinear effect in other regions. • The linear impact

  3. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan; Li, Erning; Mallick, Bani K.

    2010-01-01

    " approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  4. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error

    KAUST Repository

    Carroll, Raymond J.; Delaigle, Aurore; Hall, Peter

    2011-01-01

    In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment

  5. Investigating the complex relationship between in situ Southern Ocean pCO2 and its ocean physics and biogeochemical drivers using a nonparametric regression approach

    CSIR Research Space (South Africa)

    Pretorius, W

    2014-01-01

    Full Text Available the relationship more accurately in terms of MSE, RMSE and MAE, than a standard parametric approach (multiple linear regression). These results provide a platform for using the developed nonparametric regression model based on in situ measurements to predict p...

  6. Flexible link functions in nonparametric binary regression with Gaussian process priors.

    Science.gov (United States)

    Li, Dan; Wang, Xia; Lin, Lizhen; Dey, Dipak K

    2016-09-01

    In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. © 2015, The International Biometric Society.

  7. Post-fire debris flow prediction in Western United States: Advancements based on a nonparametric statistical technique

    Science.gov (United States)

    Nikolopoulos, E. I.; Destro, E.; Bhuiyan, M. A. E.; Borga, M., Sr.; Anagnostou, E. N.

    2017-12-01

    Fire disasters affect modern societies at global scale inducing significant economic losses and human casualties. In addition to their direct impacts they have various adverse effects on hydrologic and geomorphologic processes of a region due to the tremendous alteration of the landscape characteristics (vegetation, soil properties etc). As a consequence, wildfires often initiate a cascade of hazards such as flash floods and debris flows that usually follow the occurrence of a wildfire thus magnifying the overall impact in a region. Post-fire debris flows (PFDF) is one such type of hazards frequently occurring in Western United States where wildfires are a common natural disaster. Prediction of PDFD is therefore of high importance in this region and over the last years a number of efforts from United States Geological Survey (USGS) and National Weather Service (NWS) have been focused on the development of early warning systems that will help mitigate PFDF risk. This work proposes a prediction framework that is based on a nonparametric statistical technique (random forests) that allows predicting the occurrence of PFDF at regional scale with a higher degree of accuracy than the commonly used approaches that are based on power-law thresholds and logistic regression procedures. The work presented is based on a recently released database from USGS that reports a total of 1500 storms that triggered and did not trigger PFDF in a number of fire affected catchments in Western United States. The database includes information on storm characteristics (duration, accumulation, max intensity etc) and other auxiliary information of land surface properties (soil erodibility index, local slope etc). Results show that the proposed model is able to achieve a satisfactory prediction accuracy (threat score > 0.6) superior of previously published prediction frameworks highlighting the potential of nonparametric statistical techniques for development of PFDF prediction systems.

  8. Assessing pupil and school performance by non-parametric and parametric techniques

    NARCIS (Netherlands)

    de Witte, K.; Thanassoulis, E.; Simpson, G.; Battisti, G.; Charlesworth-May, A.

    2010-01-01

    This paper discusses the use of the non-parametric free disposal hull (FDH) and the parametric multi-level model (MLM) as alternative methods for measuring pupil and school attainment where hierarchical structured data are available. Using robust FDH estimates, we show how to decompose the overall

  9. Adaptabilidade e estabilidade via regressão não paramétrica em genótipos de café Adaptability and stability based on nonparametric regression in coffee genotypes

    Directory of Open Access Journals (Sweden)

    Moysés Nascimento

    2010-01-01

    Full Text Available O objetivo deste trabalho foi avaliar uma metodologia de análise de adaptabilidade e estabilidade fenotípica de genótipos de café baseada em regressão não paramétrica. A técnica utilizada difere das demais, pois reduz a influência na estimação do parâmetro de adaptabilidade de algum ponto extremo, ocasionado pela presença de genótipos com respostas demasiadamente diferenciadas a determinado ambiente. Foram utilizados dados provenientes de um experimento sobre produtividade média de grãos de 40 genótipos de café (Coffea canephora, com delineamento em blocos ao acaso, com seis repetições. Os genótipos foram avaliados em cinco anos (1996, 1998, 1999, 2000 e 2001, em dois locais (Sooretama e Marilândia, ES no total de dez ambientes. A metodologia proposta demonstrou ser adequada e eficiente, pois extingue os efeitos impróprios induzidos pela presença de pontos extremos e evita a recomendação incorreta de genótipos quanto à adaptabilidade.The objective of this work was to evaluate a methodology of phenotypic adaptability and stability analyses of coffee genotypes based on nonparametric regression. The technique used differs from other techniques because it reduces the influence of extreme points resulting from the presence of genotypes whose answers to a certain environment are too different on the estimation of the adaptability parameter. Data from an experiment studying the average yield of 40 coffee (Coffea canephora genotypes in a randomized block design with six replicates were used to evaluate the method. The genotypes were evaluated along five years (1996, 1998, 1999, 2000 and 2001 in two locations (Sooretama and Marilândia, ES, Brazil, in a total of ten environments. The methodology proposed proved adequate and efficient, since it eliminates the disproportionate effects induced by the presence of extreme points and avoids misleading recommendations of genotypes in terms of adaptability.

  10. Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs. One of the major factors in transaction costs theory is information. Firm networks can catalyse the interpersonal information exchange and hence, increase the access to non......-public information so that transaction costs are reduced.Many resources that are sacrificed for transaction costs are inputs that also enter the technical production process. As most production data do not distinguish between these two usages of inputs, high transaction costs result in reduced observed productivity...

  11. Measuring the Influence of Information Networks on Transaction Costs Using a Non-parametric Regression Technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian

    2011-01-01

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...

  12. Measuring the influence of networks on transaction costs using a non-parametric regression technique

    DEFF Research Database (Denmark)

    Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.

    All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs. One of the major factors in transaction costs theory is information. Firm networks can catalyse the interpersonal information exchange and hence, increase the access to non......-public information so that transaction costs are reduced. Many resources that are sacrificed for transaction costs are inputs that also enter the technical production process. As most production data do not distinguish between these two usages of inputs, high transaction costs result in reduced observed productivity...

  13. Measuring the Influence of Information Networks on Transaction Costs Using a Non-parametric Regression Technique

    DEFF Research Database (Denmark)

    Henningsen, Geraldine; Henningsen, Arne; Henning, Christian

    2011-01-01

    to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...

  14. Examining the nonparametric effect of drivers' age in rear-end accidents through an additive logistic regression model.

    Science.gov (United States)

    Ma, Lu; Yan, Xuedong

    2014-06-01

    This study seeks to inspect the nonparametric characteristics connecting the age of the driver to the relative risk of being an at-fault vehicle, in order to discover a more precise and smooth pattern of age impact, which has commonly been neglected in past studies. Records of drivers in two-vehicle rear-end collisions are selected from the general estimates system (GES) 2011 dataset. These extracted observations in fact constitute inherently matched driver pairs under certain matching variables including weather conditions, pavement conditions and road geometry design characteristics that are shared by pairs of drivers in rear-end accidents. The introduced data structure is able to guarantee that the variance of the response variable will not depend on the matching variables and hence provides a high power of statistical modeling. The estimation results exhibit a smooth cubic spline function for examining the nonlinear relationship between the age of the driver and the log odds of being at fault in a rear-end accident. The results are presented with respect to the main effect of age, the interaction effect between age and sex, and the effects of age under different scenarios of pre-crash actions by the leading vehicle. Compared to the conventional specification in which age is categorized into several predefined groups, the proposed method is more flexible and able to produce quantitatively explicit results. First, it confirms the U-shaped pattern of the age effect, and further shows that the risks of young and old drivers change rapidly with age. Second, the interaction effects between age and sex show that female and male drivers behave differently in rear-end accidents. Third, it is found that the pattern of age impact varies according to the type of pre-crash actions exhibited by the leading vehicle. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Estimating technical efficiency in the hospital sector with panel data: a comparison of parametric and non-parametric techniques.

    Science.gov (United States)

    Siciliani, Luigi

    2006-01-01

    Policy makers are increasingly interested in developing performance indicators that measure hospital efficiency. These indicators may give the purchasers of health services an additional regulatory tool to contain health expenditure. Using panel data, this study compares different parametric (econometric) and non-parametric (linear programming) techniques for the measurement of a hospital's technical efficiency. This comparison was made using a sample of 17 Italian hospitals in the years 1996-9. Highest correlations are found in the efficiency scores between the non-parametric data envelopment analysis under the constant returns to scale assumption (DEA-CRS) and several parametric models. Correlation reduces markedly when using more flexible non-parametric specifications such as data envelopment analysis under the variable returns to scale assumption (DEA-VRS) and the free disposal hull (FDH) model. Correlation also generally reduces when moving from one output to two-output specifications. This analysis suggests that there is scope for developing performance indicators at hospital level using panel data, but it is important that extensive sensitivity analysis is carried out if purchasers wish to make use of these indicators in practice.

  16. Application of nonparametric regression methods to study the relationship between NO2 concentrations and local wind direction and speed at background sites.

    Science.gov (United States)

    Donnelly, Aoife; Misstear, Bruce; Broderick, Brian

    2011-02-15

    Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling

  17. Data analysis and approximate models model choice, location-scale, analysis of variance, nonparametric regression and image analysis

    CERN Document Server

    Davies, Patrick Laurie

    2014-01-01

    Introduction IntroductionApproximate Models Notation Two Modes of Statistical AnalysisTowards One Mode of Analysis Approximation, Randomness, Chaos, Determinism ApproximationA Concept of Approximation Approximation Approximating a Data Set by a Model Approximation Regions Functionals and EquivarianceRegularization and Optimality Metrics and DiscrepanciesStrong and Weak Topologies On Being (almost) Honest Simulations and Tables Degree of Approximation and p-values ScalesStability of Analysis The Choice of En(α, P) Independence Procedures, Approximation and VaguenessDiscrete Models The Empirical Density Metrics and Discrepancies The Total Variation Metric The Kullback-Leibler and Chi-Squared Discrepancies The Po(λ) ModelThe b(k, p) and nb(k, p) Models The Flying Bomb Data The Student Study Times Data OutliersOutliers, Data Analysis and Models Breakdown Points and Equivariance Identifying Outliers and Breakdown Outliers in Multivariate Data Outliers in Linear Regression Outliers in Structured Data The Location...

  18. Nonparametric Online Learning Control for Soft Continuum Robot: An Enabling Technique for Effective Endoscopic Navigation

    Science.gov (United States)

    Lee, Kit-Hang; Fu, Denny K.C.; Leong, Martin C.W.; Chow, Marco; Fu, Hing-Choi; Althoefer, Kaspar; Sze, Kam Yim; Yeung, Chung-Kwong

    2017-01-01

    Abstract Bioinspired robotic structures comprising soft actuation units have attracted increasing research interest. Taking advantage of its inherent compliance, soft robots can assure safe interaction with external environments, provided that precise and effective manipulation could be achieved. Endoscopy is a typical application. However, previous model-based control approaches often require simplified geometric assumptions on the soft manipulator, but which could be very inaccurate in the presence of unmodeled external interaction forces. In this study, we propose a generic control framework based on nonparametric and online, as well as local, training to learn the inverse model directly, without prior knowledge of the robot's structural parameters. Detailed experimental evaluation was conducted on a soft robot prototype with control redundancy, performing trajectory tracking in dynamically constrained environments. Advanced element formulation of finite element analysis is employed to initialize the control policy, hence eliminating the need for random exploration in the robot's workspace. The proposed control framework enabled a soft fluid-driven continuum robot to follow a 3D trajectory precisely, even under dynamic external disturbance. Such enhanced control accuracy and adaptability would facilitate effective endoscopic navigation in complex and changing environments. PMID:29251567

  19. Using multiple linear regression techniques to quantify carbon ...

    African Journals Online (AJOL)

    Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...

  20. Two biased estimation techniques in linear regression: Application to aircraft

    Science.gov (United States)

    Klein, Vladislav

    1988-01-01

    Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.

  1. Nonparametric statistical techniques used in dose estimation for beagles exposed to inhaled plutonium nitrate

    International Nuclear Information System (INIS)

    Stevens, D.L.; Dagle, G.E.

    1986-01-01

    Retention and translocation of inhaled radionuclides are often estimated from the sacrifice of multiple animals at different time points. The data for each time point can be averaged and a smooth curve fitted to the mean values, or a smooth curve may be fitted to the entire data set. However, an analysis based on means may not be the most appropriate if there is substantial variation in the initial amount of the radionuclide inhaled or if the data are subject to outliers. A method has been developed that takes account of these problems. The body burden is viewed as a compartmental system, with the compartments identified with body organs. A median polish is applied to the multiple logistic transform of the compartmental fractions (compartment burden/total burden) at each time point. A smooth function is fitted to the results of the median polish. This technique was applied to data from beagles exposed to an aerosol of 239 Pu(NO 3 ) 4 . Models of retention and translocation for lungs, skeleton, liver, kidneys, and tracheobronchial lymph nodes were developed and used to estimate dose. 4 refs., 3 figs., 4 tabs

  2. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  3. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    Science.gov (United States)

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and

  4. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    Science.gov (United States)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  5. A contingency table approach to nonparametric testing

    CERN Document Server

    Rayner, JCW

    2000-01-01

    Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp

  6. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2010-01-01

    Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente

  7. Nonparametric additive regression for repeatedly measured data

    KAUST Repository

    Carroll, R. J.; Maity, A.; Mammen, E.; Yu, K.

    2009-01-01

    We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements

  8. Short-term stream flow forecasting at Australian river sites using data-driven regression techniques

    CSIR Research Space (South Africa)

    Steyn, Melise

    2017-09-01

    Full Text Available This study proposes a computationally efficient solution to stream flow forecasting for river basins where historical time series data are available. Two data-driven modeling techniques are investigated, namely support vector regression...

  9. Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.

    Science.gov (United States)

    Ohring, G.

    1972-01-01

    Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.

  10. Nonparametric functional mapping of quantitative trait loci.

    Science.gov (United States)

    Yang, Jie; Wu, Rongling; Casella, George

    2009-03-01

    Functional mapping is a useful tool for mapping quantitative trait loci (QTL) that control dynamic traits. It incorporates mathematical aspects of biological processes into the mixture model-based likelihood setting for QTL mapping, thus increasing the power of QTL detection and the precision of parameter estimation. However, in many situations there is no obvious functional form and, in such cases, this strategy will not be optimal. Here we propose to use nonparametric function estimation, typically implemented with B-splines, to estimate the underlying functional form of phenotypic trajectories, and then construct a nonparametric test to find evidence of existing QTL. Using the representation of a nonparametric regression as a mixed model, the final test statistic is a likelihood ratio test. We consider two types of genetic maps: dense maps and general maps, and the power of nonparametric functional mapping is investigated through simulation studies and demonstrated by examples.

  11. Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

    Energy Technology Data Exchange (ETDEWEB)

    Deng, Yangyang; Parajuli, Prem B.

    2011-08-10

    Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.

  12. On Cooper's Nonparametric Test.

    Science.gov (United States)

    Schmeidler, James

    1978-01-01

    The basic assumption of Cooper's nonparametric test for trend (EJ 125 069) is questioned. It is contended that the proper assumption alters the distribution of the statistic and reduces its usefulness. (JKS)

  13. The importance of the chosen technique to estimate diffuse solar radiation by means of regression

    Energy Technology Data Exchange (ETDEWEB)

    Arslan, Talha; Altyn Yavuz, Arzu [Department of Statistics. Science and Literature Faculty. Eskisehir Osmangazi University (Turkey)], email: mtarslan@ogu.edu.tr, email: aaltin@ogu.edu.tr; Acikkalp, Emin [Department of Mechanical and Manufacturing Engineering. Engineering Faculty. Bilecik University (Turkey)], email: acikkalp@gmail.com

    2011-07-01

    The Ordinary Least Squares (OLS) method is one of the most frequently used for estimation of diffuse solar radiation. The data set must provide certain assumptions for the OLS method to work. The most important is that the regression equation offered by OLS error terms must fit within the normal distribution. Utilizing an alternative robust estimator to get parameter estimations is highly effective in solving problems where there is a lack of normal distribution due to the presence of outliers or some other factor. The purpose of this study is to investigate the value of the chosen technique for the estimation of diffuse radiation. This study described alternative robust methods frequently used in applications and compared them with the OLS method. Making a comparison of the data set analysis of the OLS and that of the M Regression (Huber, Andrews and Tukey) techniques, it was study found that robust regression techniques are preferable to OLS because of the smoother explanation values.

  14. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Directory of Open Access Journals (Sweden)

    C. Wu

    2018-03-01

    Full Text Available Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS, Deming regression (DR, orthogonal distance regression (ODR, weighted ODR (WODR, and York regression (YR. We first introduce a new data generation scheme that employs the Mersenne twister (MT pseudorandom number generator. The numerical simulations are also improved by (a refining the parameterization of nonlinear measurement uncertainties, (b inclusion of a linear measurement uncertainty, and (c inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot was developed to facilitate the implementation of error-in-variables regressions.

  15. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Science.gov (United States)

    Wu, Cheng; Zhen Yu, Jian

    2018-03-01

    Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.

  16. Source apportionment of speciated PM2.5 and non-parametric regressions of PM2.5 and PM(coarse) mass concentrations from Denver and Greeley, Colorado, and construction and evaluation of dichotomous filter samplers

    Science.gov (United States)

    Piedrahita, Ricardo A.

    The Denver Aerosol Sources and Health study (DASH) was a long-term study of the relationship between the variability in fine particulate mass and chemical constituents (PM2.5, particulate matter less than 2.5mum) and adverse health effects such as cardio-respiratory illnesses and mortality. Daily filter samples were chemically analyzed for multiple species. We present findings based on 2.8 years of DASH data, from 2003 to 2005. Multilinear Engine 2 (ME-2), a receptor-based source apportionment model was applied to the data to estimate source contributions to PM2.5 mass concentrations. This study relied on two different ME-2 models: (1) a 2-way model that closely reflects PMF-2; and (2) an enhanced model with meteorological data that used additional temporal and meteorological factors. The Coarse Rural Urban Sources and Health study (CRUSH) is a long-term study of the relationship between the variability in coarse particulate mass (PMcoarse, particulate matter between 2.5 and 10mum) and adverse health effects such as cardio-respiratory illnesses, pre-term births, and mortality. Hourly mass concentrations of PMcoarse and fine particulate matter (PM2.5) are measured using tapered element oscillating microbalances (TEOMs) with Filter Dynamics Measurement Systems (FDMS), at two rural and two urban sites. We present findings based on nine months of mass concentration data, including temporal trends, and non-parametric regressions (NPR) results, which were used to characterize the wind speed and wind direction relationships that might point to sources. As part of CRUSH, 1-year coarse and fine mode particulate matter filter sampling network, will allow us to characterize the chemical composition of the particulate matter collected and perform spatial comparisons. This work describes the construction and validation testing of four dichotomous filter samplers for this purpose. The use of dichotomous splitters with an approximate 2.5mum cut point, coupled with a 10mum cut

  17. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  18. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  19. Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression

    Science.gov (United States)

    Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.

    2013-01-01

    Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…

  20. Nonparametric conditional predictive regions for time series

    NARCIS (Netherlands)

    de Gooijer, J.G.; Zerom Godefay, D.

    2000-01-01

    Several nonparametric predictors based on the Nadaraya-Watson kernel regression estimator have been proposed in the literature. They include the conditional mean, the conditional median, and the conditional mode. In this paper, we consider three types of predictive regions for these predictors — the

  1. Nonparametric Transfer Function Models

    Science.gov (United States)

    Liu, Jun M.; Chen, Rong; Yao, Qiwei

    2009-01-01

    In this paper a class of nonparametric transfer function models is proposed to model nonlinear relationships between ‘input’ and ‘output’ time series. The transfer function is smooth with unknown functional forms, and the noise is assumed to be a stationary autoregressive-moving average (ARMA) process. The nonparametric transfer function is estimated jointly with the ARMA parameters. By modeling the correlation in the noise, the transfer function can be estimated more efficiently. The parsimonious ARMA structure improves the estimation efficiency in finite samples. The asymptotic properties of the estimators are investigated. The finite-sample properties are illustrated through simulations and one empirical example. PMID:20628584

  2. Theory of nonparametric tests

    CERN Document Server

    Dickhaus, Thorsten

    2018-01-01

    This textbook provides a self-contained presentation of the main concepts and methods of nonparametric statistical testing, with a particular focus on the theoretical foundations of goodness-of-fit tests, rank tests, resampling tests, and projection tests. The substitution principle is employed as a unified approach to the nonparametric test problems discussed. In addition to mathematical theory, it also includes numerous examples and computer implementations. The book is intended for advanced undergraduate, graduate, and postdoc students as well as young researchers. Readers should be familiar with the basic concepts of mathematical statistics typically covered in introductory statistics courses.

  3. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim

    2015-01-01

    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  4. A NONPARAMETRIC HYPOTHESIS TEST VIA THE BOOTSTRAP RESAMPLING

    OpenAIRE

    Temel, Tugrul T.

    2001-01-01

    This paper adapts an already existing nonparametric hypothesis test to the bootstrap framework. The test utilizes the nonparametric kernel regression method to estimate a measure of distance between the models stated under the null hypothesis. The bootstraped version of the test allows to approximate errors involved in the asymptotic hypothesis test. The paper also develops a Mathematica Code for the test algorithm.

  5. Wind Power Ramp Events Prediction with Hybrid Machine Learning Regression Techniques and Reanalysis Data

    Directory of Open Access Journals (Sweden)

    Laura Cornejo-Bueno

    2017-11-01

    Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.

  6. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  7. Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

    Directory of Open Access Journals (Sweden)

    Fatimah Khaleel Ibrahim

    2017-08-01

    Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.

  8. Non-Parametric Estimation of Correlation Functions

    DEFF Research Database (Denmark)

    Brincker, Rune; Rytter, Anders; Krenk, Steen

    In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are point...

  9. Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

    Science.gov (United States)

    Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

    2006-11-01

    We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.

  10. A regression technique for evaluation and quantification for water quality parameters from remote sensing data

    International Nuclear Information System (INIS)

    Whitlock, C.H.; Kuo, C.Y.

    1979-01-01

    The paper attempts to define optical physics and/or environmental conditions under which the linear multiple-regression should be applicable. It is reported that investigation of the signal response shows that the exact solution for a number of optical physics conditions is of the same form as a linearized multiple-regression equation, even if nonlinear contributions from surface reflections, atmospheric constituents, or other water pollutants are included. Limitations on achieving this type of solution are defined. Laboratory data are used to demonstrate that the technique is applicable to water mixtures which contain constituents with both linear and nonlinear radiance gradients. Finally, it is concluded that instrument noise, ground-truth placement, and time lapse between remote sensor overpass and water sample operations are serious barriers to successful use of the technique

  11. Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression

    Directory of Open Access Journals (Sweden)

    Land Walker H

    2011-01-01

    Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.

  12. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    Science.gov (United States)

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  13. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  14. Nonparametric statistics with applications to science and engineering

    CERN Document Server

    Kvam, Paul H

    2007-01-01

    A thorough and definitive book that fully addresses traditional and modern-day topics of nonparametric statistics This book presents a practical approach to nonparametric statistical analysis and provides comprehensive coverage of both established and newly developed methods. With the use of MATLAB, the authors present information on theorems and rank tests in an applied fashion, with an emphasis on modern methods in regression and curve fitting, bootstrap confidence intervals, splines, wavelets, empirical likelihood, and goodness-of-fit testing. Nonparametric Statistics with Applications to Science and Engineering begins with succinct coverage of basic results for order statistics, methods of categorical data analysis, nonparametric regression, and curve fitting methods. The authors then focus on nonparametric procedures that are becoming more relevant to engineering researchers and practitioners. The important fundamental materials needed to effectively learn and apply the discussed methods are also provide...

  15. Parametric and Non-Parametric System Modelling

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg

    1999-01-01

    the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...... considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how...... networks is included. In this paper, neural networks are used for predicting the electricity production of a wind farm. The results are compared with results obtained using an adaptively estimated ARX-model. Finally, two papers on stochastic differential equations are included. In the first paper, among...

  16. Bayesian nonparametric hierarchical modeling.

    Science.gov (United States)

    Dunson, David B

    2009-04-01

    In biomedical research, hierarchical models are very widely used to accommodate dependence in multivariate and longitudinal data and for borrowing of information across data from different sources. A primary concern in hierarchical modeling is sensitivity to parametric assumptions, such as linearity and normality of the random effects. Parametric assumptions on latent variable distributions can be challenging to check and are typically unwarranted, given available prior knowledge. This article reviews some recent developments in Bayesian nonparametric methods motivated by complex, multivariate and functional data collected in biomedical studies. The author provides a brief review of flexible parametric approaches relying on finite mixtures and latent class modeling. Dirichlet process mixture models are motivated by the need to generalize these approaches to avoid assuming a fixed finite number of classes. Focusing on an epidemiology application, the author illustrates the practical utility and potential of nonparametric Bayes methods.

  17. Early cost estimating for road construction projects using multiple regression techniques

    Directory of Open Access Journals (Sweden)

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  18. Quantal Response: Nonparametric Modeling

    Science.gov (United States)

    2017-01-01

    capture the behavior of observed phenomena. Higher-order polynomial and finite-dimensional spline basis models allow for more complicated responses as the...flexibility as these are nonparametric (not constrained to any particular functional form). These should be useful in identifying nonstandard behavior via... deviance ∆ = −2 log(Lreduced/Lfull) is defined in terms of the likelihood function L. For normal error, Lfull = 1, and based on Eq. A-2, we have log

  19. Nonparametric Bayesian inference in biostatistics

    CERN Document Server

    Müller, Peter

    2015-01-01

    As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...

  20. Regression Techniques for Determining the Effective Impervious Area in Southern California Watersheds

    Science.gov (United States)

    Sultana, R.; Mroczek, M.; Dallman, S.; Sengupta, A.; Stein, E. D.

    2016-12-01

    The portion of the Total Impervious Area (TIA) that is hydraulically connected to the storm drainage network is called the Effective Impervious Area (EIA). The remaining fraction of impervious area, called the non-effective impervious area, drains onto pervious surfaces which do not contribute to runoff for smaller events. Using the TIA instead of EIA in models and calculations can lead to overestimates of runoff volumes peak discharges and oversizing of drainage system since it is assumed all impervious areas produce urban runoff that is directly connected to storm drains. This makes EIA a better predictor of actual runoff from urban catchments for hydraulic design of storm drain systems and modeling non-point source pollution. Compared to TIA, determining the EIA is considerably more difficult to calculate since it cannot be found by using remote sensing techniques, readily available EIA datasets, or aerial imagery interpretation alone. For this study, EIA percentages were calculated by two successive regression methods for five watersheds (with areas of 8.38 - 158mi2) located in Southern California using rainfall-runoff event data for the years 2004 - 2007. Runoff generated from the smaller storm events are considered to be emanating only from the effective impervious areas. Therefore, larger events that were considered to have runoff from both impervious and pervious surfaces were successively removed in the regression methods using a criterion of (1) 1mm and (2) a max (2 , 1mm) above the regression line. MSE is calculated from actual runoff and runoff predicted by the regression. Analysis of standard deviations showed that criterion of max (2 , 1mm) better fit the regression line and is the preferred method in predicting the EIA percentage. The estimated EIAs have shown to be approximately 78% to 43% of the TIA which shows use of EIA instead of TIA can have significant impact on the cost building urban hydraulic systems and stormwater capture devices.

  1. A Structural Labor Supply Model with Nonparametric Preferences

    NARCIS (Netherlands)

    van Soest, A.H.O.; Das, J.W.M.; Gong, X.

    2000-01-01

    Nonparametric techniques are usually seen as a statistic device for data description and exploration, and not as a tool for estimating models with a richer economic structure, which are often required for policy analysis.This paper presents an example where nonparametric flexibility can be attained

  2. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Science.gov (United States)

    Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing

    2014-01-01

    Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948

  3. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Directory of Open Access Journals (Sweden)

    Wen-Yuan Chen

    2014-06-01

    Full Text Available Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1 we use a terrain drop compensation (TDC technique to solve the problem of the concavity of railway crossings; (2 we use a linear regression technique to predict the position and length of an object from image processing; (3 we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.

  4. Scaling model for prediction of radionuclide activity in cooling water using a regression triplet technique

    International Nuclear Information System (INIS)

    Silvia Dulanska; Lubomir Matel; Milan Meloun

    2010-01-01

    The decommissioning of the nuclear power plant (NPP) A1 Jaslovske Bohunice (Slovakia) is a complicated set of problems that is highly demanding both technically and financially. The basic goal of the decommissioning process is the total elimination of radioactive materials from the nuclear power plant area, and radwaste treatment to a form suitable for its safe disposal. The initial conditions of decommissioning also include elimination of the operational events, preparation and transport of the fuel from the plant territory, radiochemical and physical-chemical characterization of the radioactive wastes. One of the problems was and still is the processing of the liquid radioactive wastes. Such media is also the cooling water of the long-term storage of spent fuel. A suitable scaling model for predicting the activity of hard-to-detect radionuclides 239,240 Pu, 90 Sr and summary beta in cooling water using a regression triplet technique has been built using the regression triplet analysis and regression diagnostics. (author)

  5. Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

    Science.gov (United States)

    Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

    2009-11-01

    Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.

  6. Short term load forecasting technique based on the seasonal exponential adjustment method and the regression model

    International Nuclear Information System (INIS)

    Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao

    2013-01-01

    Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests

  7. Nonparametric statistical inference

    CERN Document Server

    Gibbons, Jean Dickinson

    2014-01-01

    Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.

  8. Empirical Methods for Detecting Regional Trends and Other Spatial Expressions in Antrim Shale Gas Productivity, with Implications for Improving Resource Projections Using Local Nonparametric Estimation Techniques

    Science.gov (United States)

    Coburn, T.C.; Freeman, P.A.; Attanasi, E.D.

    2012-01-01

    The primary objectives of this research were to (1) investigate empirical methods for establishing regional trends in unconventional gas resources as exhibited by historical production data and (2) determine whether or not incorporating additional knowledge of a regional trend in a suite of previously established local nonparametric resource prediction algorithms influences assessment results. Three different trend detection methods were applied to publicly available production data (well EUR aggregated to 80-acre cells) from the Devonian Antrim Shale gas play in the Michigan Basin. This effort led to the identification of a southeast-northwest trend in cell EUR values across the play that, in a very general sense, conforms to the primary fracture and structural orientations of the province. However, including this trend in the resource prediction algorithms did not lead to improved results. Further analysis indicated the existence of clustering among cell EUR values that likely dampens the contribution of the regional trend. The reason for the clustering, a somewhat unexpected result, is not completely understood, although the geological literature provides some possible explanations. With appropriate data, a better understanding of this clustering phenomenon may lead to important information about the factors and their interactions that control Antrim Shale gas production, which may, in turn, help establish a more general protocol for better estimating resources in this and other shale gas plays. ?? 2011 International Association for Mathematical Geology (outside the USA).

  9. Nonparametric combinatorial sequence models.

    Science.gov (United States)

    Wauthier, Fabian L; Jordan, Michael I; Jojic, Nebojsa

    2011-11-01

    This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are "linked" and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This article presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three biological sequence families which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution over sequence representations induced by the prior. By integrating out the posterior, our method compares favorably to leading binding predictors.

  10. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2017-08-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  11. Nonparametric tests for censored data

    CERN Document Server

    Bagdonavicus, Vilijandas; Nikulin, Mikhail

    2013-01-01

    This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.

  12. Near infrared spectrometric technique for testing fruit quality: optimisation of regression models using genetic algorithms

    Science.gov (United States)

    Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.

    2016-02-01

    Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.

  13. A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy

    Directory of Open Access Journals (Sweden)

    Jibo Yue

    2018-01-01

    Full Text Available Above-ground biomass (AGB provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index and eight statistical regression techniques: artificial neural network (ANN, multivariable linear regression (MLR, decision-tree regression (DT, boosted binary regression tree (BBRT, partial least squares regression (PLSR, random forest regression (RF, support vector machine regression (SVM, and principal component regression (PCR, which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise. Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE = 0.18; thus, PLSR is best suited for works requiring high

  14. A nonparametric mixture model for cure rate estimation.

    Science.gov (United States)

    Peng, Y; Dear, K B

    2000-03-01

    Nonparametric methods have attracted less attention than their parametric counterparts for cure rate analysis. In this paper, we study a general nonparametric mixture model. The proportional hazards assumption is employed in modeling the effect of covariates on the failure time of patients who are not cured. The EM algorithm, the marginal likelihood approach, and multiple imputations are employed to estimate parameters of interest in the model. This model extends models and improves estimation methods proposed by other researchers. It also extends Cox's proportional hazards regression model by allowing a proportion of event-free patients and investigating covariate effects on that proportion. The model and its estimation method are investigated by simulations. An application to breast cancer data, including comparisons with previous analyses using a parametric model and an existing nonparametric model by other researchers, confirms the conclusions from the parametric model but not those from the existing nonparametric model.

  15. Piecewise linear regression techniques to analyze the timing of head coach dismissals in Dutch soccer clubs

    NARCIS (Netherlands)

    Schryver, T. de; Eisinga, R.

    2010-01-01

    The key question in research on dismissals of head coaches in sports clubs is not whether they should happen but when they will happen. This paper applies piecewise linear regression to advance our understanding of the timing of head coach dismissals. Essentially, the regression sacrifices degrees

  16. Nonparametric identification of copula structures

    KAUST Repository

    Li, Bo; Genton, Marc G.

    2013-01-01

    We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric

  17. Thermoluminescence dating of chinese porcelain using a regression method of saturating exponential in pre-dose technique

    International Nuclear Information System (INIS)

    Wang Weida; Xia Junding; Zhou Zhixin; Leung, P.L.

    2001-01-01

    Thermoluminescence (TL) dating using a regression method of saturating exponential in pre-dose technique was described. 23 porcelain samples from past dynasties of China were dated by this method. The results show that the TL ages are in reasonable agreement with archaeological dates within a standard deviation of 27%. Such error can be accepted in porcelain dating

  18. Decision support using nonparametric statistics

    CERN Document Server

    Beatty, Warren

    2018-01-01

    This concise volume covers nonparametric statistics topics that most are most likely to be seen and used from a practical decision support perspective. While many degree programs require a course in parametric statistics, these methods are often inadequate for real-world decision making in business environments. Much of the data collected today by business executives (for example, customer satisfaction opinions) requires nonparametric statistics for valid analysis, and this book provides the reader with a set of tools that can be used to validly analyze all data, regardless of type. Through numerous examples and exercises, this book explains why nonparametric statistics will lead to better decisions and how they are used to reach a decision, with a wide array of business applications. Online resources include exercise data, spreadsheets, and solutions.

  19. Adaptive nonparametric Bayesian inference using location-scale mixture priors

    NARCIS (Netherlands)

    Jonge, de R.; Zanten, van J.H.

    2010-01-01

    We study location-scale mixture priors for nonparametric statistical problems, including multivariate regression, density estimation and classification. We show that a rate-adaptive procedure can be obtained if the prior is properly constructed. In particular, we show that adaptation is achieved if

  20. A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

    Science.gov (United States)

    Nose, Takashi; Kobayashi, Takao

    In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.

  1. Non-parametric smoothing of experimental data

    International Nuclear Information System (INIS)

    Kuketayev, A.T.; Pen'kov, F.M.

    2007-01-01

    Full text: Rapid processing of experimental data samples in nuclear physics often requires differentiation in order to find extrema. Therefore, even at the preliminary stage of data analysis, a range of noise reduction methods are used to smooth experimental data. There are many non-parametric smoothing techniques: interval averages, moving averages, exponential smoothing, etc. Nevertheless, it is more common to use a priori information about the behavior of the experimental curve in order to construct smoothing schemes based on the least squares techniques. The latter methodology's advantage is that the area under the curve can be preserved, which is equivalent to conservation of total speed of counting. The disadvantages of this approach include the lack of a priori information. For example, very often the sums of undifferentiated (by a detector) peaks are replaced with one peak during the processing of data, introducing uncontrolled errors in the determination of the physical quantities. The problem is solvable only by having experienced personnel, whose skills are much greater than the challenge. We propose a set of non-parametric techniques, which allows the use of any additional information on the nature of experimental dependence. The method is based on a construction of a functional, which includes both experimental data and a priori information. Minimum of this functional is reached on a non-parametric smoothed curve. Euler (Lagrange) differential equations are constructed for these curves; then their solutions are obtained analytically or numerically. The proposed approach allows for automated processing of nuclear physics data, eliminating the need for highly skilled laboratory personnel. Pursuant to the proposed approach is the possibility to obtain smoothing curves in a given confidence interval, e.g. according to the χ 2 distribution. This approach is applicable when constructing smooth solutions of ill-posed problems, in particular when solving

  2. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    Science.gov (United States)

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  3. Prediction of Five Softwood Paper Properties from its Density using Support Vector Machine Regression Techniques

    Directory of Open Access Journals (Sweden)

    Esperanza García-Gonzalo

    2016-01-01

    Full Text Available Predicting paper properties based on a limited number of measured variables can be an important tool for the industry. Mathematical models were developed to predict mechanical and optical properties from the corresponding paper density for some softwood papers using support vector machine regression with the Radial Basis Function Kernel. A dataset of different properties of paper handsheets produced from pulps of pine (Pinus pinaster and P. sylvestris and cypress species (Cupressus lusitanica, C. sempervirens, and C. arizonica beaten at 1000, 4000, and 7000 revolutions was used. The results show that it is possible to obtain good models (with high coefficient of determination with two variables: the numerical variable density and the categorical variable species.

  4. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Directory of Open Access Journals (Sweden)

    Saerom Park

    Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  5. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    Science.gov (United States)

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  6. Nonparametric Estimation of Regression Parameters in Measurement Error Models

    Czech Academy of Sciences Publication Activity Database

    Ehsanes Saleh, A.K.M.D.; Picek, J.; Kalina, Jan

    2009-01-01

    Roč. 67, č. 2 (2009), s. 177-200 ISSN 0026-1424 Grant - others:GA AV ČR(CZ) IAA101120801; GA MŠk(CZ) LC06024 Institutional research plan: CEZ:AV0Z10300504 Keywords : asymptotic relative efficiency(ARE) * asymptotic theory * emaculate mode * Me model * R-estimation * Reliabilty ratio(RR) Subject RIV: BB - Applied Statistics, Operational Research

  7. Nonparametric factor analysis of time series

    OpenAIRE

    Rodríguez-Poo, Juan M.; Linton, Oliver Bruce

    1998-01-01

    We introduce a nonparametric smoothing procedure for nonparametric factor analaysis of multivariate time series. The asymptotic properties of the proposed procedures are derived. We present an application based on the residuals from the Fair macromodel.

  8. Statistical Analysis of Reactor Pressure Vessel Fluence Calculation Benchmark Data Using Multiple Regression Techniques

    International Nuclear Information System (INIS)

    Carew, John F.; Finch, Stephen J.; Lois, Lambros

    2003-01-01

    The calculated >1-MeV pressure vessel fluence is used to determine the fracture toughness and integrity of the reactor pressure vessel. It is therefore of the utmost importance to ensure that the fluence prediction is accurate and unbiased. In practice, this assurance is provided by comparing the predictions of the calculational methodology with an extensive set of accurate benchmarks. A benchmarking database is used to provide an estimate of the overall average measurement-to-calculation (M/C) bias in the calculations ( ). This average is used as an ad-hoc multiplicative adjustment to the calculations to correct for the observed calculational bias. However, this average only provides a well-defined and valid adjustment of the fluence if the M/C data are homogeneous; i.e., the data are statistically independent and there is no correlation between subsets of M/C data.Typically, the identification of correlations between the errors in the database M/C values is difficult because the correlation is of the same magnitude as the random errors in the M/C data and varies substantially over the database. In this paper, an evaluation of a reactor dosimetry benchmark database is performed to determine the statistical validity of the adjustment to the calculated pressure vessel fluence. Physical mechanisms that could potentially introduce a correlation between the subsets of M/C ratios are identified and included in a multiple regression analysis of the M/C data. Rigorous statistical criteria are used to evaluate the homogeneity of the M/C data and determine the validity of the adjustment.For the database evaluated, the M/C data are found to be strongly correlated with dosimeter response threshold energy and dosimeter location (e.g., cavity versus in-vessel). It is shown that because of the inhomogeneity in the M/C data, for this database, the benchmark data do not provide a valid basis for adjusting the pressure vessel fluence.The statistical criteria and methods employed in

  9. Nonparametric Inference for Periodic Sequences

    KAUST Repository

    Sun, Ying

    2012-02-01

    This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.

  10. Nonparametric predictive inference in reliability

    International Nuclear Information System (INIS)

    Coolen, F.P.A.; Coolen-Schrijner, P.; Yan, K.J.

    2002-01-01

    We introduce a recently developed statistical approach, called nonparametric predictive inference (NPI), to reliability. Bounds for the survival function for a future observation are presented. We illustrate how NPI can deal with right-censored data, and discuss aspects of competing risks. We present possible applications of NPI for Bernoulli data, and we briefly outline applications of NPI for replacement decisions. The emphasis is on introduction and illustration of NPI in reliability contexts, detailed mathematical justifications are presented elsewhere

  11. Multiple predictor smoothing methods for sensitivity analysis: Description of techniques

    International Nuclear Information System (INIS)

    Storlie, Curtis B.; Helton, Jon C.

    2008-01-01

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. Then, in the second and concluding part of this presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present

  12. Diagnostic tools for nearest neighbors techniques when used with satellite imagery

    Science.gov (United States)

    Ronald E. McRoberts

    2009-01-01

    Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....

  13. A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data

    International Nuclear Information System (INIS)

    Lefevre, Nathalie; Watson, Andrew J.; Watson, Adam R.

    2005-01-01

    Using about 138,000 measurements of surface pCO 2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO 2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO 2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO 2 is higher than estimates based on available syntheses of surface pCO 2 . This supports earlier suggestions that the sink of CO 2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO 2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 μatm. The subpolar gyre is a net sink of CO 2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof

  14. Nonparametric identification of copula structures

    KAUST Repository

    Li, Bo

    2013-06-01

    We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.

  15. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    International Nuclear Information System (INIS)

    Dyar, M.D.; Carmosino, M.L.; Breves, E.A.; Ozanne, M.V.; Clegg, S.M.; Wiens, R.C.

    2012-01-01

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  16. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    Energy Technology Data Exchange (ETDEWEB)

    Dyar, M.D., E-mail: mdyar@mtholyoke.edu [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Carmosino, M.L.; Breves, E.A.; Ozanne, M.V. [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Clegg, S.M.; Wiens, R.C. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  17. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

    Science.gov (United States)

    Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

    2011-04-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.

  18. Marginal longitudinal semiparametric regression via penalized splines

    KAUST Repository

    Al Kadiri, M.

    2010-08-01

    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  19. Marginal longitudinal semiparametric regression via penalized splines

    KAUST Repository

    Al Kadiri, M.; Carroll, R.J.; Wand, M.P.

    2010-01-01

    We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.

  20. Nonparametric correlation models for portfolio allocation

    DEFF Research Database (Denmark)

    Aslanidis, Nektarios; Casas, Isabel

    2013-01-01

    This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural ...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....

  1. Nonparametric statistics for social and behavioral sciences

    CERN Document Server

    Kraska-MIller, M

    2013-01-01

    Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency

  2. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  3. An Innovative Technique to Assess Spontaneous Baroreflex Sensitivity with Short Data Segments: Multiple Trigonometric Regressive Spectral Analysis.

    Science.gov (United States)

    Li, Kai; Rüdiger, Heinz; Haase, Rocco; Ziemssen, Tjalf

    2018-01-01

    Objective: As the multiple trigonometric regressive spectral (MTRS) analysis is extraordinary in its ability to analyze short local data segments down to 12 s, we wanted to evaluate the impact of the data segment settings by applying the technique of MTRS analysis for baroreflex sensitivity (BRS) estimation using a standardized data pool. Methods: Spectral and baroreflex analyses were performed on the EuroBaVar dataset (42 recordings, including lying and standing positions). For this analysis, the technique of MTRS was used. We used different global and local data segment lengths, and chose the global data segments from different positions. Three global data segments of 1 and 2 min and three local data segments of 12, 20, and 30 s were used in MTRS analysis for BRS. Results: All the BRS-values calculated on the three global data segments were highly correlated, both in the supine and standing positions; the different global data segments provided similar BRS estimations. When using different local data segments, all the BRS-values were also highly correlated. However, in the supine position, using short local data segments of 12 s overestimated BRS compared with those using 20 and 30 s. In the standing position, the BRS estimations using different local data segments were comparable. There was no proportional bias for the comparisons between different BRS estimations. Conclusion: We demonstrate that BRS estimation by the MTRS technique is stable when using different global data segments, and MTRS is extraordinary in its ability to evaluate BRS in even short local data segments (20 and 30 s). Because of the non-stationary character of most biosignals, the MTRS technique would be preferable for BRS analysis especially in conditions when only short stationary data segments are available or when dynamic changes of BRS should be monitored.

  4. Nonparametric e-Mixture Estimation.

    Science.gov (United States)

    Takano, Ken; Hino, Hideitsu; Akaho, Shotaro; Murata, Noboru

    2016-12-01

    This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions-in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the [Formula: see text]- and [Formula: see text]-mixtures. The [Formula: see text]-mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the [Formula: see text]-mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The [Formula: see text]-mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the [Formula: see text]-mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set.

  5. Screen Wars, Star Wars, and Sequels: Nonparametric Reanalysis of Movie Profitability

    OpenAIRE

    W. D. Walls

    2012-01-01

    In this paper we use nonparametric statistical tools to quantify motion-picture profit. We quantify the unconditional distribution of profit, the distribution of profit conditional on stars and sequels, and we also model the conditional expectation of movie profits using a non- parametric data-driven regression model. The flexibility of the non-parametric approach accommodates the full range of possible relationships among the variables without prior specification of a functional form, thereb...

  6. Developing an immigration policy for Germany on the basis of a nonparametric labor market classification

    OpenAIRE

    Froelich, Markus; Puhani, Patrick

    2004-01-01

    Based on a nonparametrically estimated model of labor market classifications, this paper makes suggestions for immigration policy using data from western Germany in the 1990s. It is demonstrated that nonparametric regression is feasible in higher dimensions with only a few thousand observations. In sum, labor markets able to absorb immigrants are characterized by above average age and by professional occupations. On the other hand, labor markets for young workers in service occupations are id...

  7. Modeling ionospheric foF 2 response during geomagnetic storms using neural network and linear regression techniques

    Science.gov (United States)

    Tshisaphungo, Mpho; Habarulema, John Bosco; McKinnell, Lee-Anne

    2018-06-01

    In this paper, the modeling of the ionospheric foF 2 changes during geomagnetic storms by means of neural network (NN) and linear regression (LR) techniques is presented. The results will lead to a valuable tool to model the complex ionospheric changes during disturbed days in an operational space weather monitoring and forecasting environment. The storm-time foF 2 data during 1996-2014 from Grahamstown (33.3°S, 26.5°E), South Africa ionosonde station was used in modeling. In this paper, six storms were reserved to validate the models and hence not used in the modeling process. We found that the performance of both NN and LR models is comparable during selected storms which fell within the data period (1996-2014) used in modeling. However, when validated on storm periods beyond 1996-2014, the NN model gives a better performance (R = 0.62) compared to LR model (R = 0.56) for a storm that reached a minimum Dst index of -155 nT during 19-23 December 2015. We also found that both NN and LR models are capable of capturing the ionospheric foF 2 responses during two great geomagnetic storms (28 October-1 November 2003 and 6-12 November 2004) which have been demonstrated to be difficult storms to model in previous studies.

  8. Non-parametric transformation for data correlation and integration: From theory to practice

    Energy Technology Data Exchange (ETDEWEB)

    Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)

    1997-08-01

    The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.

  9. Bayesian Nonparametric Longitudinal Data Analysis.

    Science.gov (United States)

    Quintana, Fernando A; Johnson, Wesley O; Waetjen, Elaine; Gold, Ellen

    2016-01-01

    Practical Bayesian nonparametric methods have been developed across a wide variety of contexts. Here, we develop a novel statistical model that generalizes standard mixed models for longitudinal data that include flexible mean functions as well as combined compound symmetry (CS) and autoregressive (AR) covariance structures. AR structure is often specified through the use of a Gaussian process (GP) with covariance functions that allow longitudinal data to be more correlated if they are observed closer in time than if they are observed farther apart. We allow for AR structure by considering a broader class of models that incorporates a Dirichlet Process Mixture (DPM) over the covariance parameters of the GP. We are able to take advantage of modern Bayesian statistical methods in making full predictive inferences and about characteristics of longitudinal profiles and their differences across covariate combinations. We also take advantage of the generality of our model, which provides for estimation of a variety of covariance structures. We observe that models that fail to incorporate CS or AR structure can result in very poor estimation of a covariance or correlation matrix. In our illustration using hormone data observed on women through the menopausal transition, biology dictates the use of a generalized family of sigmoid functions as a model for time trends across subpopulation categories.

  10. Nonparametric Bayesian Modeling of Complex Networks

    DEFF Research Database (Denmark)

    Schmidt, Mikkel Nørgaard; Mørup, Morten

    2013-01-01

    an infinite mixture model as running example, we go through the steps of deriving the model as an infinite limit of a finite parametric model, inferring the model parameters by Markov chain Monte Carlo, and checking the model?s fit and predictive performance. We explain how advanced nonparametric models......Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...

  11. Comparison of Regression Techniques to Predict Response of Oilseed Rape Yield to Variation in Climatic Conditions in Denmark

    DEFF Research Database (Denmark)

    Sharif, Behzad; Makowski, David; Plauborg, Finn

    2017-01-01

    Statistical regression models represent alternatives to process-based dynamic models for predicting the response of crop yields to variation in climatic conditions. Regression models can be used to quantify the effect of change in temperature and precipitation on yields. However, it is difficult ...

  12. Essays on nonparametric econometrics of stochastic volatility

    NARCIS (Netherlands)

    Zu, Y.

    2012-01-01

    Volatility is a concept that describes the variation of financial returns. Measuring and modelling volatility dynamics is an important aspect of financial econometrics. This thesis is concerned with nonparametric approaches to volatility measurement and volatility model validation.

  13. Nonparametric methods for volatility density estimation

    NARCIS (Netherlands)

    Es, van Bert; Spreij, P.J.C.; Zanten, van J.H.

    2009-01-01

    Stochastic volatility modelling of financial processes has become increasingly popular. The proposed models usually contain a stationary volatility process. We will motivate and review several nonparametric methods for estimation of the density of the volatility process. Both models based on

  14. Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2017-01-01

    In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...

  15. An Assessment of Polynomial Regression Techniques for the Relative Radiometric Normalization (RRN of High-Resolution Multi-Temporal Airborne Thermal Infrared (TIR Imagery

    Directory of Open Access Journals (Sweden)

    Mir Mustafizur Rahman

    2014-11-01

    Full Text Available Thermal Infrared (TIR remote sensing images of urban environments are increasingly available from airborne and satellite platforms. However, limited access to high-spatial resolution (H-res: ~1 m TIR satellite images requires the use of TIR airborne sensors for mapping large complex urban surfaces, especially at micro-scales. A critical limitation of such H-res mapping is the need to acquire a large scene composed of multiple flight lines and mosaic them together. This results in the same scene components (e.g., roads, buildings, green space and water exhibiting different temperatures in different flight lines. To mitigate these effects, linear relative radiometric normalization (RRN techniques are often applied. However, the Earth’s surface is composed of features whose thermal behaviour is characterized by complexity and non-linearity. Therefore, we hypothesize that non-linear RRN techniques should demonstrate increased radiometric agreement over similar linear techniques. To test this hypothesis, this paper evaluates four (linear and non-linear RRN techniques, including: (i histogram matching (HM; (ii pseudo-invariant feature-based polynomial regression (PIF_Poly; (iii no-change stratified random sample-based linear regression (NCSRS_Lin; and (iv no-change stratified random sample-based polynomial regression (NCSRS_Poly; two of which (ii and iv are newly proposed non-linear techniques. When applied over two adjacent flight lines (~70 km2 of TABI-1800 airborne data, visual and statistical results show that both new non-linear techniques improved radiometric agreement over the previously evaluated linear techniques, with the new fully-automated method, NCSRS-based polynomial regression, providing the highest improvement in radiometric agreement between the master and the slave images, at ~56%. This is ~5% higher than the best previously evaluated linear technique (NCSRS-based linear regression.

  16. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    2012-01-01

    by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...

  17. Panel data nonparametric estimation of production risk and risk preferences

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price......We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...

  18. A Bayesian nonparametric approach to causal inference on quantiles.

    Science.gov (United States)

    Xu, Dandan; Daniels, Michael J; Winterstein, Almut G

    2018-02-25

    We propose a Bayesian nonparametric approach (BNP) for causal inference on quantiles in the presence of many confounders. In particular, we define relevant causal quantities and specify BNP models to avoid bias from restrictive parametric assumptions. We first use Bayesian additive regression trees (BART) to model the propensity score and then construct the distribution of potential outcomes given the propensity score using a Dirichlet process mixture (DPM) of normals model. We thoroughly evaluate the operating characteristics of our approach and compare it to Bayesian and frequentist competitors. We use our approach to answer an important clinical question involving acute kidney injury using electronic health records. © 2018, The International Biometric Society.

  19. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  20. Semiparametric regression during 2003–2007

    KAUST Repository

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2009-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

  1. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  2. Nonparametric Monitoring for Geotechnical Structures Subject to Long-Term Environmental Change

    Directory of Open Access Journals (Sweden)

    Hae-Bum Yun

    2011-01-01

    Full Text Available A nonparametric, data-driven methodology of monitoring for geotechnical structures subject to long-term environmental change is discussed. Avoiding physical assumptions or excessive simplification of the monitored structures, the nonparametric monitoring methodology presented in this paper provides reliable performance-related information particularly when the collection of sensor data is limited. For the validation of the nonparametric methodology, a field case study was performed using a full-scale retaining wall, which had been monitored for three years using three tilt gauges. Using the very limited sensor data, it is demonstrated that important performance-related information, such as drainage performance and sensor damage, could be disentangled from significant daily, seasonal and multiyear environmental variations. Extensive literature review on recent developments of parametric and nonparametric data processing techniques for geotechnical applications is also presented.

  3. A non-parametric framework for estimating threshold limit values

    Directory of Open Access Journals (Sweden)

    Ulm Kurt

    2005-11-01

    Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.

  4. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  5. PREDICTION OF MALIGNANT BREAST LESIONS FROM MRI FEATURES: A COMPARISON OF ARTIFICIAL NEURAL NETWORK AND LOGISTIC REGRESSION TECHNIQUES

    Science.gov (United States)

    McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying

    2009-01-01

    Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of

  6. The Support Reduction Algorithm for Computing Non-Parametric Function Estimates in Mixture Models

    OpenAIRE

    GROENEBOOM, PIET; JONGBLOED, GEURT; WELLNER, JON A.

    2008-01-01

    In this paper, we study an algorithm (which we call the support reduction algorithm) that can be used to compute non-parametric M-estimators in mixture models. The algorithm is compared with natural competitors in the context of convex regression and the ‘Aspect problem’ in quantum physics.

  7. Recent Advances and Trends in Nonparametric Statistics

    CERN Document Server

    Akritas, MG

    2003-01-01

    The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o

  8. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  9. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  10. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques.

    Science.gov (United States)

    Jones, Kelly W; Lewis, David J

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented--from protected areas to payments for ecosystem services (PES)--to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing 'matching' to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods--an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators--due to the presence of unobservable bias--that lead to differences in conclusions about effectiveness. The Ecuador case illustrates that

  11. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques.

    Directory of Open Access Journals (Sweden)

    Kelly W Jones

    Full Text Available Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented--from protected areas to payments for ecosystem services (PES--to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing 'matching' to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods--an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1 matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2 fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators--due to the presence of unobservable bias--that lead to differences in conclusions about effectiveness. The Ecuador case

  12. Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques

    Science.gov (United States)

    Jones, Kelly W.; Lewis, David J.

    2015-01-01

    Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented—from protected areas to payments for ecosystem services (PES)—to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing ‘matching’ to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods—an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators—due to the presence of unobservable bias—that lead to differences in conclusions about effectiveness. The Ecuador case

  13. Using non-parametric methods in econometric production analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...

  14. Teaching Nonparametric Statistics Using Student Instrumental Values.

    Science.gov (United States)

    Anderson, Jonathan W.; Diddams, Margaret

    Nonparametric statistics are often difficult to teach in introduction to statistics courses because of the lack of real-world examples. This study demonstrated how teachers can use differences in the rankings and ratings of undergraduate and graduate values to discuss: (1) ipsative and normative scaling; (2) uses of the Mann-Whitney U-test; and…

  15. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2000-01-01

    New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on

  16. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2004-01-01

    Statistical process control (SPC) is used to decide when to stop a process as confidence in the quality of the next item(s) is low. Information to specify a parametric model is not always available, and as SPC is of a predictive nature, we present a control chart developed using nonparametric

  17. Nonparametric estimation in models for unobservable heterogeneity

    OpenAIRE

    Hohmann, Daniel

    2014-01-01

    Nonparametric models which allow for data with unobservable heterogeneity are studied. The first publication introduces new estimators and their asymptotic properties for conditional mixture models. The second publication considers estimation of a function from noisy observations of its Radon transform in a Gaussian white noise model.

  18. Nonparametric estimation of location and scale parameters

    KAUST Repository

    Potgieter, C.J.; Lombard, F.

    2012-01-01

    Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal

  19. A Bayesian Nonparametric Approach to Factor Analysis

    DEFF Research Database (Denmark)

    Piatek, Rémi; Papaspiliopoulos, Omiros

    2018-01-01

    This paper introduces a new approach for the inference of non-Gaussian factor models based on Bayesian nonparametric methods. It relaxes the usual normality assumption on the latent factors, widely used in practice, which is too restrictive in many settings. Our approach, on the contrary, does no...

  20. Bioprocess iterative batch-to-batch optimization based on hybrid parametric/nonparametric models.

    Science.gov (United States)

    Teixeira, Ana P; Clemente, João J; Cunha, António E; Carrondo, Manuel J T; Oliveira, Rui

    2006-01-01

    This paper presents a novel method for iterative batch-to-batch dynamic optimization of bioprocesses. The relationship between process performance and control inputs is established by means of hybrid grey-box models combining parametric and nonparametric structures. The bioreactor dynamics are defined by material balance equations, whereas the cell population subsystem is represented by an adjustable mixture of nonparametric and parametric models. Thus optimizations are possible without detailed mechanistic knowledge concerning the biological system. A clustering technique is used to supervise the reliability of the nonparametric subsystem during the optimization. Whenever the nonparametric outputs are unreliable, the objective function is penalized. The technique was evaluated with three simulation case studies. The overall results suggest that the convergence to the optimal process performance may be achieved after a small number of batches. The model unreliability risk constraint along with sampling scheduling are crucial to minimize the experimental effort required to attain a given process performance. In general terms, it may be concluded that the proposed method broadens the application of the hybrid parametric/nonparametric modeling technique to "newer" processes with higher potential for optimization.

  1. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  2. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Directory of Open Access Journals (Sweden)

    Ani Shabri

    2014-01-01

    Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  3. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  4. Total-Factor Energy Efficiency (TFEE Evaluation on Thermal Power Industry with DEA, Malmquist and Multiple Regression Techniques

    Directory of Open Access Journals (Sweden)

    Jin-Peng Liu

    2017-07-01

    Full Text Available Under the background of a new round of power market reform, realizing the goals of energy saving and emission reduction, reducing the coal consumption and ensuring the sustainable development are the key issues for thermal power industry. With the biggest economy and energy consumption scales in the world, China should promote the energy efficiency of thermal power industry to solve these problems. Therefore, from multiple perspectives, the factors influential to the energy efficiency of thermal power industry were identified. Based on the economic, social and environmental factors, a combination model with Data Envelopment Analysis (DEA and Malmquist index was constructed to evaluate the total-factor energy efficiency (TFEE in thermal power industry. With the empirical studies from national and provincial levels, the TFEE index can be factorized into the technical efficiency index (TECH, the technical progress index (TPCH, the pure efficiency index (PECH and the scale efficiency index (SECH. The analysis showed that the TFEE was mainly determined by TECH and PECH. Meanwhile, by panel data regression model, unit coal consumption, talents and government supervision were selected as important indexes to have positive effects on TFEE in thermal power industry. In addition, the negative indexes, such as energy price and installed capacity, were also analyzed to control their undesired effects. Finally, considering the analysis results, measures for improving energy efficiency of thermal power industry were discussed widely, such as strengthening technology research and design (R&D, enforcing pollutant and emission reduction, distributing capital and labor rationally and improving the government supervision. Relative study results and suggestions can provide references for Chinese government and enterprises to enhance the energy efficiency level.

  5. A Nonparametric Bayesian Approach For Emission Tomography Reconstruction

    International Nuclear Information System (INIS)

    Barat, Eric; Dautremer, Thomas

    2007-01-01

    We introduce a PET reconstruction algorithm following a nonparametric Bayesian (NPB) approach. In contrast with Expectation Maximization (EM), the proposed technique does not rely on any space discretization. Namely, the activity distribution--normalized emission intensity of the spatial poisson process--is considered as a spatial probability density and observations are the projections of random emissions whose distribution has to be estimated. This approach is nonparametric in the sense that the quantity of interest belongs to the set of probability measures on R k (for reconstruction in k-dimensions) and it is Bayesian in the sense that we define a prior directly on this spatial measure. In this context, we propose to model the nonparametric probability density as an infinite mixture of multivariate normal distributions. As a prior for this mixture we consider a Dirichlet Process Mixture (DPM) with a Normal-Inverse Wishart (NIW) model as base distribution of the Dirichlet Process. As in EM-family reconstruction, we use a data augmentation scheme where the set of hidden variables are the emission locations for each observed line of response in the continuous object space. Thanks to the data augmentation, we propose a Markov Chain Monte Carlo (MCMC) algorithm (Gibbs sampler) which is able to generate draws from the posterior distribution of the spatial intensity. A difference with EM is that one step of the Gibbs sampler corresponds to the generation of emission locations while only the expected number of emissions per pixel/voxel is used in EM. Another key difference is that the estimated spatial intensity is a continuous function such that there is no need to compute a projection matrix. Finally, draws from the intensity posterior distribution allow the estimation of posterior functionnals like the variance or confidence intervals. Results are presented for simulated data based on a 2D brain phantom and compared to Bayesian MAP-EM

  6. Nonparametric Bayes Modeling of Multivariate Categorical Data.

    Science.gov (United States)

    Dunson, David B; Xing, Chuanhua

    2012-01-01

    Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

  7. Nonparametric estimation of benchmark doses in environmental risk assessment

    Science.gov (United States)

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133

  8. Network structure exploration via Bayesian nonparametric models

    International Nuclear Information System (INIS)

    Chen, Y; Wang, X L; Xiang, X; Tang, B Z; Bu, J Z

    2015-01-01

    Complex networks provide a powerful mathematical representation of complex systems in nature and society. To understand complex networks, it is crucial to explore their internal structures, also called structural regularities. The task of network structure exploration is to determine how many groups there are in a complex network and how to group the nodes of the network. Most existing structure exploration methods need to specify either a group number or a certain type of structure when they are applied to a network. In the real world, however, the group number and also the certain type of structure that a network has are usually unknown in advance. To explore structural regularities in complex networks automatically, without any prior knowledge of the group number or the certain type of structure, we extend a probabilistic mixture model that can handle networks with any type of structure but needs to specify a group number using Bayesian nonparametric theory. We also propose a novel Bayesian nonparametric model, called the Bayesian nonparametric mixture (BNPM) model. Experiments conducted on a large number of networks with different structures show that the BNPM model is able to explore structural regularities in networks automatically with a stable, state-of-the-art performance. (paper)

  9. portfolio optimization based on nonparametric estimation methods

    Directory of Open Access Journals (Sweden)

    mahsa ghandehari

    2017-03-01

    Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.

  10. Nonparametric Mixture Models for Supervised Image Parcellation.

    Science.gov (United States)

    Sabuncu, Mert R; Yeo, B T Thomas; Van Leemput, Koen; Fischl, Bruce; Golland, Polina

    2009-09-01

    We present a nonparametric, probabilistic mixture model for the supervised parcellation of images. The proposed model yields segmentation algorithms conceptually similar to the recently developed label fusion methods, which register a new image with each training image separately. Segmentation is achieved via the fusion of transferred manual labels. We show that in our framework various settings of a model parameter yield algorithms that use image intensity information differently in determining the weight of a training subject during fusion. One particular setting computes a single, global weight per training subject, whereas another setting uses locally varying weights when fusing the training data. The proposed nonparametric parcellation approach capitalizes on recently developed fast and robust pairwise image alignment tools. The use of multiple registrations allows the algorithm to be robust to occasional registration failures. We report experiments on 39 volumetric brain MRI scans with expert manual labels for the white matter, cerebral cortex, ventricles and subcortical structures. The results demonstrate that the proposed nonparametric segmentation framework yields significantly better segmentation than state-of-the-art algorithms.

  11. Robustifying Bayesian nonparametric mixtures for count data.

    Science.gov (United States)

    Canale, Antonio; Prünster, Igor

    2017-03-01

    Our motivating application stems from surveys of natural populations and is characterized by large spatial heterogeneity in the counts, which makes parametric approaches to modeling local animal abundance too restrictive. We adopt a Bayesian nonparametric approach based on mixture models and innovate with respect to popular Dirichlet process mixture of Poisson kernels by increasing the model flexibility at the level both of the kernel and the nonparametric mixing measure. This allows to derive accurate and robust estimates of the distribution of local animal abundance and of the corresponding clusters. The application and a simulation study for different scenarios yield also some general methodological implications. Adding flexibility solely at the level of the mixing measure does not improve inferences, since its impact is severely limited by the rigidity of the Poisson kernel with considerable consequences in terms of bias. However, once a kernel more flexible than the Poisson is chosen, inferences can be robustified by choosing a prior more general than the Dirichlet process. Therefore, to improve the performance of Bayesian nonparametric mixtures for count data one has to enrich the model simultaneously at both levels, the kernel and the mixing measure. © 2016, The International Biometric Society.

  12. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    Science.gov (United States)

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. Introduction to nonparametric statistics for the biological sciences using R

    CERN Document Server

    MacFarland, Thomas W

    2016-01-01

    This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...

  14. Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes

    Science.gov (United States)

    Zhou, Wei-Xing; Sornette, Didier

    We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.

  15. A Bayesian nonparametric estimation of distributions and quantiles

    International Nuclear Information System (INIS)

    Poern, K.

    1988-11-01

    The report describes a Bayesian, nonparametric method for the estimation of a distribution function and its quantiles. The method, presupposing random sampling, is nonparametric, so the user has to specify a prior distribution on a space of distributions (and not on a parameter space). In the current application, where the method is used to estimate the uncertainty of a parametric calculational model, the Dirichlet prior distribution is to a large extent determined by the first batch of Monte Carlo-realizations. In this case the results of the estimation technique is very similar to the conventional empirical distribution function. The resulting posterior distribution is also Dirichlet, and thus facilitates the determination of probability (confidence) intervals at any given point in the space of interest. Another advantage is that also the posterior distribution of a specified quantitle can be derived and utilized to determine a probability interval for that quantile. The method was devised for use in the PROPER code package for uncertainty and sensitivity analysis. (orig.)

  16. Using Apparent Density of Paper from Hardwood Kraft Pulps to Predict Sheet Properties, based on Unsupervised Classification and Multivariable Regression Techniques

    Directory of Open Access Journals (Sweden)

    Ofélia Anjos

    2015-07-01

    Full Text Available Paper properties determine the product application potential and depend on the raw material, pulping conditions, and pulp refining. The aim of this study was to construct mathematical models that predict quantitative relations between the paper density and various mechanical and optical properties of the paper. A dataset of properties of paper handsheets produced with pulps of Acacia dealbata, Acacia melanoxylon, and Eucalyptus globulus beaten at 500, 2500, and 4500 revolutions was used. Unsupervised classification techniques were combined to assess the need to perform separated prediction models for each species, and multivariable regression techniques were used to establish such prediction models. It was possible to develop models with a high goodness of fit using paper density as the independent variable (or predictor for all variables except tear index and zero-span tensile strength, both dry and wet.

  17. Data analysis with small samples and non-normal data nonparametrics and other strategies

    CERN Document Server

    Siebert, Carl F

    2017-01-01

    Written in everyday language for non-statisticians, this book provides all the information needed to successfully conduct nonparametric analyses. This ideal reference book provides step-by-step instructions to lead the reader through each analysis, screenshots of the software and output, and case scenarios to illustrate of all the analytic techniques.

  18. DPpackage: Bayesian Semi- and Nonparametric Modeling in R

    Directory of Open Access Journals (Sweden)

    Alejandro Jara

    2011-04-01

    Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.

  19. Regression Phalanxes

    OpenAIRE

    Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

    2017-01-01

    Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...

  20. Multi-Response Optimization and Regression Analysis of Process Parameters for Wire-EDMed HCHCr Steel Using Taguchi’s Technique

    Directory of Open Access Journals (Sweden)

    K. Srujay Varma

    2017-04-01

    Full Text Available In this study, effect of machining process parameters viz. pulse-on time, pulse-off time, current and servo-voltage for machining High Carbon High Chromium Steel (HCHCr using copper electrode in wire EDM was investigated. High Carbon High Chromium Steel is a difficult to machine alloy, which has many applications in low temperature manufacturing, and copper is chosen as electrode as it has good electrical conductivity and most frequently used electrode all over the world. Tool making culture of copper has made many shops in Europe and Japan to used copper electrode. Experiments were conducted according to Taguchi’s technique by varying the machining process parameters at three levels. Taguchi’s method based on L9 orthogonal array was followed and number of experiments was limited to 9. Experimental cost and time consumption was reduced by following this statistical technique. Targeted output parameters are Material Removal Rate (MRR, Vickers Hardness (HV and Surface Roughness (SR. Analysis of Variance (ANOVA and Regression Analysis was performed using Minitab 17 software to optimize the parameters and draw relationship between input and output process parameters. Regression models were developed relating input and output parameters. It was observed that most influential factor for MRR, Hardness and SR are Ton, Toff and SV.

  1. Stochastic search, optimization and regression with energy applications

    Science.gov (United States)

    Hannah, Lauren A.

    models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

  2. A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses.

    Science.gov (United States)

    Zhang, Linlin; Guindani, Michele; Versace, Francesco; Vannucci, Marina

    2014-07-15

    In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with a hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov random field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis-Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. A nonparametric approach to medical survival data: Uncertainty in the context of risk in mortality analysis

    International Nuclear Information System (INIS)

    Janurová, Kateřina; Briš, Radim

    2014-01-01

    Medical survival right-censored data of about 850 patients are evaluated to analyze the uncertainty related to the risk of mortality on one hand and compare two basic surgery techniques in the context of risk of mortality on the other hand. Colorectal data come from patients who underwent colectomy in the University Hospital of Ostrava. Two basic surgery operating techniques are used for the colectomy: either traditional (open) or minimally invasive (laparoscopic). Basic question arising at the colectomy operation is, which type of operation to choose to guarantee longer overall survival time. Two non-parametric approaches have been used to quantify probability of mortality with uncertainties. In fact, complement of the probability to one, i.e. survival function with corresponding confidence levels is calculated and evaluated. First approach considers standard nonparametric estimators resulting from both the Kaplan–Meier estimator of survival function in connection with Greenwood's formula and the Nelson–Aalen estimator of cumulative hazard function including confidence interval for survival function as well. The second innovative approach, represented by Nonparametric Predictive Inference (NPI), uses lower and upper probabilities for quantifying uncertainty and provides a model of predictive survival function instead of the population survival function. The traditional log-rank test on one hand and the nonparametric predictive comparison of two groups of lifetime data on the other hand have been compared to evaluate risk of mortality in the context of mentioned surgery techniques. The size of the difference between two groups of lifetime data has been considered and analyzed as well. Both nonparametric approaches led to the same conclusion, that the minimally invasive operating technique guarantees the patient significantly longer survival time in comparison with the traditional operating technique

  4. Decompounding random sums: A nonparametric approach

    DEFF Research Database (Denmark)

    Hansen, Martin Bøgsted; Pitts, Susan M.

    Observations from sums of random variables with a random number of summands, known as random, compound or stopped sums arise within many areas of engineering and science. Quite often it is desirable to infer properties of the distribution of the terms in the random sum. In the present paper we...... review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...

  5. A Nonparametric Test for Seasonal Unit Roots

    OpenAIRE

    Kunst, Robert M.

    2009-01-01

    Abstract: We consider a nonparametric test for the null of seasonal unit roots in quarterly time series that builds on the RUR (records unit root) test by Aparicio, Escribano, and Sipols. We find that the test concept is more promising than a formalization of visual aids such as plots by quarter. In order to cope with the sensitivity of the original RUR test to autocorrelation under its null of a unit root, we suggest an augmentation step by autoregression. We present some evidence on the siz...

  6. Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis.

    Science.gov (United States)

    Bornkamp, Björn; Ickstadt, Katja

    2009-03-01

    In this article, we consider monotone nonparametric regression in a Bayesian framework. The monotone function is modeled as a mixture of shifted and scaled parametric probability distribution functions, and a general random probability measure is assumed as the prior for the mixing distribution. We investigate the choice of the underlying parametric distribution function and find that the two-sided power distribution function is well suited both from a computational and mathematical point of view. The model is motivated by traditional nonlinear models for dose-response analysis, and provides possibilities to elicitate informative prior distributions on different aspects of the curve. The method is compared with other recent approaches to monotone nonparametric regression in a simulation study and is illustrated on a data set from dose-response analysis.

  7. Spurious Seasonality Detection: A Non-Parametric Test Proposal

    Directory of Open Access Journals (Sweden)

    Aurelio F. Bariviera

    2018-01-01

    Full Text Available This paper offers a general and comprehensive definition of the day-of-the-week effect. Using symbolic dynamics, we develop a unique test based on ordinal patterns in order to detect it. This test uncovers the fact that the so-called “day-of-the-week” effect is partly an artifact of the hidden correlation structure of the data. We present simulations based on artificial time series as well. While time series generated with long memory are prone to exhibit daily seasonality, pure white noise signals exhibit no pattern preference. Since ours is a non-parametric test, it requires no assumptions about the distribution of returns, so that it could be a practical alternative to conventional econometric tests. We also made an exhaustive application of the here-proposed technique to 83 stock indexes around the world. Finally, the paper highlights the relevance of symbolic analysis in economic time series studies.

  8. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    Science.gov (United States)

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  9. Multi-Directional Non-Parametric Analysis of Agricultural Efficiency

    DEFF Research Database (Denmark)

    Balezentis, Tomas

    This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... to the Multi-Directional Efficiency Analysis approach when the proposed models were employed to analyse empirical data of Lithuanian family farm performance, we saw substantial differences in efficiencies associated with different inputs. In particular, assets appeared to be the least efficiently used input...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...

  10. Bayesian Nonparametric Clustering for Positive Definite Matrices.

    Science.gov (United States)

    Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos

    2016-05-01

    Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.

  11. On weighted and locally polynomial directional quantile regression

    Czech Academy of Sciences Publication Activity Database

    Boček, Pavel; Šiman, Miroslav

    2017-01-01

    Roč. 32, č. 3 (2017), s. 929-946 ISSN 0943-4062 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : Quantile regression * Nonparametric regression * Nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) Impact factor: 0.434, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0458380.pdf

  12. Machine-learning techniques for family demography: an application of random forests to the analysis of divorce determinants in Germany

    OpenAIRE

    Arpino, Bruno; Le Moglie, Marco; Mencarini, Letizia

    2018-01-01

    Demographers often analyze the determinants of life-course events with parametric regression-type approaches. Here, we present a class of nonparametric approaches, broadly defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a popular type known as random forest. We argue that random forests can be useful either as a substitute, or a complement, to more standard parametric regression modeling. Our discussion of random forests is intuitive and...

  13. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  14. Design optimization of tailor-rolled blank thin-walled structures based on ɛ-support vector regression technique and genetic algorithm

    Science.gov (United States)

    Duan, Libin; Xiao, Ning-cong; Li, Guangyao; Cheng, Aiguo; Chen, Tao

    2017-07-01

    Tailor-rolled blank thin-walled (TRB-TH) structures have become important vehicle components owing to their advantages of light weight and crashworthiness. The purpose of this article is to provide an efficient lightweight design for improving the energy-absorbing capability of TRB-TH structures under dynamic loading. A finite element (FE) model for TRB-TH structures is established and validated by performing a dynamic axial crash test. Different material properties for individual parts with different thicknesses are considered in the FE model. Then, a multi-objective crashworthiness design of the TRB-TH structure is constructed based on the ɛ-support vector regression (ɛ-SVR) technique and non-dominated sorting genetic algorithm-II. The key parameters (C, ɛ and σ) are optimized to further improve the predictive accuracy of ɛ-SVR under limited sample points. Finally, the technique for order preference by similarity to the ideal solution method is used to rank the solutions in Pareto-optimal frontiers and find the best compromise optima. The results demonstrate that the light weight and crashworthiness performance of the optimized TRB-TH structures are superior to their uniform thickness counterparts. The proposed approach provides useful guidance for designing TRB-TH energy absorbers for vehicle bodies.

  15. Comparison of several measure-correlate-predict models using support vector regression techniques to estimate wind power densities. A case study

    International Nuclear Information System (INIS)

    Díaz, Santiago; Carta, José A.; Matías, José M.

    2017-01-01

    Highlights: • Eight measure-correlate-predict (MCP) models used to estimate the wind power densities (WPDs) at a target site are compared. • Support vector regressions are used as the main prediction techniques in the proposed MCPs. • The most precise MCP uses two sub-models which predict wind speed and air density in an unlinked manner. • The most precise model allows to construct a bivariable (wind speed and air density) WPD probability density function. • MCP models trained to minimise wind speed prediction error do not minimise WPD prediction error. - Abstract: The long-term annual mean wind power density (WPD) is an important indicator of wind as a power source which is usually included in regional wind resource maps as useful prior information to identify potentially attractive sites for the installation of wind projects. In this paper, a comparison is made of eight proposed Measure-Correlate-Predict (MCP) models to estimate the WPDs at a target site. Seven of these models use the Support Vector Regression (SVR) and the eighth the Multiple Linear Regression (MLR) technique, which serves as a basis to compare the performance of the other models. In addition, a wrapper technique with 10-fold cross-validation has been used to select the optimal set of input features for the SVR and MLR models. Some of the eight models were trained to directly estimate the mean hourly WPDs at a target site. Others, however, were firstly trained to estimate the parameters on which the WPD depends (i.e. wind speed and air density) and then, using these parameters, the target site mean hourly WPDs. The explanatory features considered are different combinations of the mean hourly wind speeds, wind directions and air densities recorded in 2014 at ten weather stations in the Canary Archipelago (Spain). The conclusions that can be drawn from the study undertaken include the argument that the most accurate method for the long-term estimation of WPDs requires the execution of a

  16. On Parametric (and Non-Parametric Variation

    Directory of Open Access Journals (Sweden)

    Neil Smith

    2009-11-01

    Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.

  17. Nonparametric predictive pairwise comparison with competing risks

    International Nuclear Information System (INIS)

    Coolen-Maturi, Tahani

    2014-01-01

    In reliability, failure data often correspond to competing risks, where several failure modes can cause a unit to fail. This paper presents nonparametric predictive inference (NPI) for pairwise comparison with competing risks data, assuming that the failure modes are independent. These failure modes could be the same or different among the two groups, and these can be both observed and unobserved failure modes. NPI is a statistical approach based on few assumptions, with inferences strongly based on data and with uncertainty quantified via lower and upper probabilities. The focus is on the lower and upper probabilities for the event that the lifetime of a future unit from one group, say Y, is greater than the lifetime of a future unit from the second group, say X. The paper also shows how the two groups can be compared based on particular failure mode(s), and the comparison of the two groups when some of the competing risks are combined is discussed

  18. Nonparametric estimation of location and scale parameters

    KAUST Repository

    Potgieter, C.J.

    2012-12-01

    Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.

  19. Nonparametric inference of network structure and dynamics

    Science.gov (United States)

    Peixoto, Tiago P.

    The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among

  20. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  1. Simple nonparametric checks for model data fit in CAT

    NARCIS (Netherlands)

    Meijer, R.R.

    2005-01-01

    In this paper, the usefulness of several nonparametric checks is discussed in a computerized adaptive testing (CAT) context. Although there is no tradition of nonparametric scalability in CAT, it can be argued that scalability checks can be useful to investigate, for example, the quality of item

  2. Nonparametric Bayesian inference for multidimensional compound Poisson processes

    NARCIS (Netherlands)

    Gugushvili, S.; van der Meulen, F.; Spreij, P.

    2015-01-01

    Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context,

  3. Nonparametric analysis of blocked ordered categories data: some examples revisited

    Directory of Open Access Journals (Sweden)

    O. Thas

    2006-08-01

    Full Text Available Nonparametric analysis for general block designs can be given by using the Cochran-Mantel-Haenszel (CMH statistics. We demonstrate this with four examples and note that several well-known nonparametric statistics are special cases of CMH statistics.

  4. True phosphorus digestibility and the endogenous phosphorus outputs associated with brown rice for weanling pigs measured by the simple linear regression analysis technique.

    Science.gov (United States)

    Yang, H; Li, A K; Yin, Y L; Li, T J; Wang, Z R; Wu, G; Huang, R L; Kong, X F; Yang, C B; Kang, P; Deng, J; Wang, S X; Tan, B E; Hu, Q; Xing, F F; Wu, X; He, Q H; Yao, K; Liu, Z J; Tang, Z R; Yin, F G; Deng, Z Y; Xie, M Y; Fan, M Z

    2007-03-01

    The objectives of this study were to determine true phosphorus (P) digestibility, degradability of phytate-P complex and the endogenous P outputs associated with brown rice feeding in weanling pigs by using the simple linear regression analysis technique. Six barrows with an average initial body weight of 12.5 kg were fitted with a T-cannula and fed six diets according to a 6 × 6 Latin-square design. Six maize starch-based diets, containing six levels of P at 0.80, 1.36, 1.93, 2.49, 3.04, and 3.61 g/kg per kg dry-matter (DM) intake (DMI), were formulated with brown rice. Each experimental period lasted 10 days. After a 7-day adaptation, all faecal samples were collected on days 8 and 9. Ileal digesta samples were collected for a total of 24 h on day 10. The apparent ileal and faecal P digestibility values of brown rice were affected ( P Linear relationships ( P simple regression analysis technique. There were no differences ( P>0.05) in true P digestibility values (57.7 ± 5.4 v. 58.2 ± 5.9%), phytate P degradability (76.4 ± 6.7 v. 79.0 ± 4.4%) and the endogenous P outputs (0.812 ± 0..096 v. 0.725 ± 0.083 g/kg DMI) between the ileal and the faecal levels. The endogenous faecal P output represented 14 and 25% of the National Research Council (1998) recommended daily total and available P requirements in the weanling pig, respectively. About 58% of the total P in brown rice could be digested and absorbed by the weanling pig. Our results suggest that the large intestine of the weanling pigs does not play a significant role in the digestion of P in brown rice. Diet formulation on the basis of total or apparent P digestibility with brown rice may lead to P overfeeding and excessive P excretion in pigs.

  5. 2nd Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Manteiga, Wenceslao; Romo, Juan

    2016-01-01

    This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...

  6. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  7. Comparing pseudo-absences generation techniques in Boosted Regression Trees models for conservation purposes: A case study on amphibians in a protected area.

    Directory of Open Access Journals (Sweden)

    Francesco Cerasoli

    Full Text Available Boosted Regression Trees (BRT is one of the modelling techniques most recently applied to biodiversity conservation and it can be implemented with presence-only data through the generation of artificial absences (pseudo-absences. In this paper, three pseudo-absences generation techniques are compared, namely the generation of pseudo-absences within target-group background (TGB, testing both the weighted (WTGB and unweighted (UTGB scheme, and the generation at random (RDM, evaluating their performance and applicability in distribution modelling and species conservation. The choice of the target group fell on amphibians, because of their rapid decline worldwide and the frequent lack of guidelines for conservation strategies and regional-scale planning, which instead could be provided through an appropriate implementation of SDMs. Bufo bufo, Salamandrina perspicillata and Triturus carnifex were considered as target species, in order to perform our analysis with species having different ecological and distributional characteristics. The study area is the "Gran Sasso-Monti della Laga" National Park, which hosts 15 Natura 2000 sites and represents one of the most important biodiversity hotspots in Europe. Our results show that the model calibration ameliorates when using the target-group based pseudo-absences compared to the random ones, especially when applying the WTGB. Contrarily, model discrimination did not significantly vary in a consistent way among the three approaches with respect to the tree target species. Both WTGB and RDM clearly isolate the highly contributing variables, supplying many relevant indications for species conservation actions. Moreover, the assessment of pairwise variable interactions and their three-dimensional visualization further increase the amount of useful information for protected areas' managers. Finally, we suggest the use of RDM as an admissible alternative when it is not possible to individuate a suitable set of

  8. Artificial neural networks environmental forecasting in comparison with multiple linear regression technique: From heavy metals to organic micropollutants screening in agricultural soils

    Science.gov (United States)

    Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea

    2016-12-01

    The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.

  9. Assessing the Credit Risk of Corporate Bonds Based on Factor Analysis and Logistic Regress Analysis Techniques: Evidence from New Energy Enterprises in China

    Directory of Open Access Journals (Sweden)

    Yuanxin Liu

    2018-05-01

    Full Text Available In recent years, new energy sources have ushered in tremendous opportunities for development. The difficulties to finance new energy enterprises (NEEs can be estimated through issuing corporate bonds. However, there are few scientific and reasonable methods to assess the credit risk of NEE bonds, which is not conducive to the healthy development of NEEs. Based on this, this paper analyzes the advantages and risks of NEEs issuing bonds and the main factors affecting the credit risk of NEE bonds, constructs a hybrid model for assessing the credit risk of NEE bonds based on factor analysis and logistic regress analysis techniques, and verifies the applicability and effectiveness of the model employing relevant data from 46 Chinese NEEs. The results show that the main factors affecting the credit risk of NEE bonds are internal factors involving the company’s profitability, solvency, operational ability, growth potential, asset structure and viability, and external factors including macroeconomic environment and energy policy support. Based on the empirical results and the exact situation of China’s NEE bonds, this article finally puts forward several targeted recommendations.

  10. A local non-parametric model for trade sign inference

    Science.gov (United States)

    Blazejewski, Adam; Coggins, Richard

    2005-03-01

    We investigate a regularity in market order submission strategies for 12 stocks with large market capitalization on the Australian Stock Exchange. The regularity is evidenced by a predictable relationship between the trade sign (trade initiator), size of the trade, and the contents of the limit order book before the trade. We demonstrate this predictability by developing an empirical inference model to classify trades into buyer-initiated and seller-initiated. The model employs a local non-parametric method, k-nearest neighbor, which in the past was used successfully for chaotic time series prediction. The k-nearest neighbor with three predictor variables achieves an average out-of-sample classification accuracy of 71.40%, compared to 63.32% for the linear logistic regression with seven predictor variables. The result suggests that a non-linear approach may produce a more parsimonious trade sign inference model with a higher out-of-sample classification accuracy. Furthermore, for most of our stocks the observed regularity in market order submissions seems to have a memory of at least 30 trading days.

  11. Probability Machines: Consistent Probability Estimation Using Nonparametric Learning Machines

    Science.gov (United States)

    Malley, J. D.; Kruppa, J.; Dasgupta, A.; Malley, K. G.; Ziegler, A.

    2011-01-01

    Summary Background Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications. PMID:21915433

  12. Hadron Energy Reconstruction for ATLAS Barrel Combined Calorimeter Using Non-Parametrical Method

    CERN Document Server

    Kulchitskii, Yu A

    2000-01-01

    Hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter in the framework of the non-parametrical method is discussed. The non-parametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to fast energy reconstruction in a first level trigger. The reconstructed mean values of the hadron energies are within \\pm1% of the true values and the fractional energy resolution is [(58\\pm 3)%{\\sqrt{GeV}}/\\sqrt{E}+(2.5\\pm0.3)%]\\bigoplus(1.7\\pm0.2) GeV/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74\\pm0.04. Results of a study of the longitudinal hadronic shower development are also presented.

  13. Effective behaviour change techniques for physical activity and healthy eating in overweight and obese adults; systematic review and meta-regression analyses.

    Science.gov (United States)

    Samdal, Gro Beate; Eide, Geir Egil; Barth, Tom; Williams, Geoffrey; Meland, Eivind

    2017-03-28

    This systematic review aims to explain the heterogeneity in results of interventions to promote physical activity and healthy eating for overweight and obese adults, by exploring the differential effects of behaviour change techniques (BCTs) and other intervention characteristics. The inclusion criteria specified RCTs with ≥ 12 weeks' duration, from January 2007 to October 2014, for adults (mean age ≥ 40 years, mean BMI ≥ 30). Primary outcomes were measures of healthy diet or physical activity. Two reviewers rated study quality, coded the BCTs, and collected outcome results at short (≤6 months) and long term (≥12 months). Meta-analyses and meta-regressions were used to estimate effect sizes (ES), heterogeneity indices (I 2 ) and regression coefficients. We included 48 studies containing a total of 82 outcome reports. The 32 long term reports had an overall ES = 0.24 with 95% confidence interval (CI): 0.15 to 0.33 and I 2  = 59.4%. The 50 short term reports had an ES = 0.37 with 95% CI: 0.26 to 0.48, and I 2  = 71.3%. The number of BCTs unique to the intervention group, and the BCTs goal setting and self-monitoring of behaviour predicted the effect at short and long term. The total number of BCTs in both intervention arms and using the BCTs goal setting of outcome, feedback on outcome of behaviour, implementing graded tasks, and adding objects to the environment, e.g. using a step counter, significantly predicted the effect at long term. Setting a goal for change; and the presence of reporting bias independently explained 58.8% of inter-study variation at short term. Autonomy supportive and person-centred methods as in Motivational Interviewing, the BCTs goal setting of behaviour, and receiving feedback on the outcome of behaviour, explained all of the between study variations in effects at long term. There are similarities, but also differences in effective BCTs promoting change in healthy eating and physical activity and

  14. Nonparametric methods in actigraphy: An update

    Directory of Open Access Journals (Sweden)

    Bruno S.B. Gonçalves

    2014-09-01

    Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.

  15. Bayesian nonparametric adaptive control using Gaussian processes.

    Science.gov (United States)

    Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A

    2015-03-01

    Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.

  16. Nonparametric tests for equality of psychometric functions.

    Science.gov (United States)

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2017-12-07

    Many empirical studies measure psychometric functions (curves describing how observers' performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel-Haenszel test, a generalization of the Berry-Mielke test that was developed here, and a split variant of the generalized Mantel-Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in MATLAB and R to conduct these tests is available as Electronic Supplemental Material.

  17. Adaptive nonparametric estimation for L\\'evy processes observed at low frequency

    OpenAIRE

    Kappus, Johanna

    2013-01-01

    This article deals with adaptive nonparametric estimation for L\\'evy processes observed at low frequency. For general linear functionals of the L\\'evy measure, we construct kernel estimators, provide upper risk bounds and derive rates of convergence under regularity assumptions. Our focus lies on the adaptive choice of the bandwidth, using model selection techniques. We face here a non-standard problem of model selection with unknown variance. A new approach towards this problem is proposed, ...

  18. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  19. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  20. An adaptive distance measure for use with nonparametric models

    International Nuclear Information System (INIS)

    Garvey, D. R.; Hines, J. W.

    2006-01-01

    Distance measures perform a critical task in nonparametric, locally weighted regression. Locally weighted regression (LWR) models are a form of 'lazy learning' which construct a local model 'on the fly' by comparing a query vector to historical, exemplar vectors according to a three step process. First, the distance of the query vector to each of the exemplar vectors is calculated. Next, these distances are passed to a kernel function, which converts the distances to similarities or weights. Finally, the model output or response is calculated by performing locally weighted polynomial regression. To date, traditional distance measures, such as the Euclidean, weighted Euclidean, and L1-norm have been used as the first step in the prediction process. Since these measures do not take into consideration sensor failures and drift, they are inherently ill-suited for application to 'real world' systems. This paper describes one such LWR model, namely auto associative kernel regression (AAKR), and describes a new, Adaptive Euclidean distance measure that can be used to dynamically compensate for faulty sensor inputs. In this new distance measure, the query observations that lie outside of the training range (i.e. outside the minimum and maximum input exemplars) are dropped from the distance calculation. This allows for the distance calculation to be robust to sensor drifts and failures, in addition to providing a method for managing inputs that exceed the training range. In this paper, AAKR models using the standard and Adaptive Euclidean distance are developed and compared for the pressure system of an operating nuclear power plant. It is shown that using the standard Euclidean distance for data with failed inputs, significant errors in the AAKR predictions can result. By using the Adaptive Euclidean distance it is shown that high fidelity predictions are possible, in spite of the input failure. In fact, it is shown that with the Adaptive Euclidean distance prediction

  1. Smooth semi-nonparametric (SNP) estimation of the cumulative incidence function.

    Science.gov (United States)

    Duc, Anh Nguyen; Wolbers, Marcel

    2017-08-15

    This paper presents a novel approach to estimation of the cumulative incidence function in the presence of competing risks. The underlying statistical model is specified via a mixture factorization of the joint distribution of the event type and the time to the event. The time to event distributions conditional on the event type are modeled using smooth semi-nonparametric densities. One strength of this approach is that it can handle arbitrary censoring and truncation while relying on mild parametric assumptions. A stepwise forward algorithm for model estimation and adaptive selection of smooth semi-nonparametric polynomial degrees is presented, implemented in the statistical software R, evaluated in a sequence of simulation studies, and applied to data from a clinical trial in cryptococcal meningitis. The simulations demonstrate that the proposed method frequently outperforms both parametric and nonparametric alternatives. They also support the use of 'ad hoc' asymptotic inference to derive confidence intervals. An extension to regression modeling is also presented, and its potential and challenges are discussed. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models

    International Nuclear Information System (INIS)

    Storlie, Curtis B.; Swiler, Laura P.; Helton, Jon C.; Sallaberry, Cedric J.

    2009-01-01

    The analysis of many physical and engineering problems involves running complex computational models (simulation models, computer codes). With problems of this type, it is important to understand the relationships between the input variables (whose values are often imprecisely known) and the output. The goal of sensitivity analysis (SA) is to study this relationship and identify the most significant factors or variables affecting the results of the model. In this presentation, an improvement on existing methods for SA of complex computer models is described for use when the model is too computationally expensive for a standard Monte-Carlo analysis. In these situations, a meta-model or surrogate model can be used to estimate the necessary sensitivity index for each input. A sensitivity index is a measure of the variance in the response that is due to the uncertainty in an input. Most existing approaches to this problem either do not work well with a large number of input variables and/or they ignore the error involved in estimating a sensitivity index. Here, a new approach to sensitivity index estimation using meta-models and bootstrap confidence intervals is described that provides solutions to these drawbacks. Further, an efficient yet effective approach to incorporate this methodology into an actual SA is presented. Several simulated and real examples illustrate the utility of this approach. This framework can be extended to uncertainty analysis as well.

  3. A non-parametric test for partial monotonicity in multiple regression

    NARCIS (Netherlands)

    van Beek, M.; Daniëls, H.A.M.

    Partial positive (negative) monotonicity in a dataset is the property that an increase in an independent variable, ceteris paribus, generates an increase (decrease) in the dependent variable. A test for partial monotonicity in datasets could (1) increase model performance if monotonicity may be

  4. Generation of Recommendable Values for the Surface Tension of Water Using a Nonparametric Regression

    Czech Academy of Sciences Publication Activity Database

    Pátek, Jaroslav; Součková, Monika; Klomfar, Jaroslav

    2016-01-01

    Roč. 61, č. 2 (2016), s. 928-935 ISSN 0021-9568 R&D Projects: GA ČR GA13-00145S Institutional support: RVO:61388998 Keywords : water * surface tension * experimental data * recommended data Subject RIV: BJ - Thermodynamics Impact factor: 2.323, year: 2016

  5. Weak Disposability in Nonparametric Production Analysis with Undesirable Outputs

    NARCIS (Netherlands)

    Kuosmanen, T.K.

    2005-01-01

    Environmental Economics and Natural Resources Group at Wageningen University in The Netherlands Weak disposability of outputs means that firms can abate harmful emissions by decreasing the activity level. Modeling weak disposability in nonparametric production analysis has caused some confusion.

  6. Multi-sample nonparametric treatments comparison in medical ...

    African Journals Online (AJOL)

    Multi-sample nonparametric treatments comparison in medical follow-up study with unequal observation processes through simulation and bladder tumour case study. P. L. Tan, N.A. Ibrahim, M.B. Adam, J. Arasan ...

  7. Speaker Linking and Applications using Non-Parametric Hashing Methods

    Science.gov (United States)

    2016-09-08

    nonparametric estimate of a multivariate density function,” The Annals of Math- ematical Statistics , vol. 36, no. 3, pp. 1049–1051, 1965. [9] E. A. Patrick...Speaker Linking and Applications using Non-Parametric Hashing Methods† Douglas Sturim and William M. Campbell MIT Lincoln Laboratory, Lexington, MA...with many approaches [1, 2]. For this paper, we focus on using i-vectors [2], but the methods apply to any embedding. For the task of speaker QBE and

  8. Modeling Non-Gaussian Time Series with Nonparametric Bayesian Model.

    Science.gov (United States)

    Xu, Zhiguang; MacEachern, Steven; Xu, Xinyi

    2015-02-01

    We present a class of Bayesian copula models whose major components are the marginal (limiting) distribution of a stationary time series and the internal dynamics of the series. We argue that these are the two features with which an analyst is typically most familiar, and hence that these are natural components with which to work. For the marginal distribution, we use a nonparametric Bayesian prior distribution along with a cdf-inverse cdf transformation to obtain large support. For the internal dynamics, we rely on the traditionally successful techniques of normal-theory time series. Coupling the two components gives us a family of (Gaussian) copula transformed autoregressive models. The models provide coherent adjustments of time scales and are compatible with many extensions, including changes in volatility of the series. We describe basic properties of the models, show their ability to recover non-Gaussian marginal distributions, and use a GARCH modification of the basic model to analyze stock index return series. The models are found to provide better fit and improved short-range and long-range predictions than Gaussian competitors. The models are extensible to a large variety of fields, including continuous time models, spatial models, models for multiple series, models driven by external covariate streams, and non-stationary models.

  9. Bayesian Nonparametric Model for Estimating Multistate Travel Time Distribution

    Directory of Open Access Journals (Sweden)

    Emmanuel Kidando

    2017-01-01

    Full Text Available Multistate models, that is, models with more than two distributions, are preferred over single-state probability models in modeling the distribution of travel time. Literature review indicated that the finite multistate modeling of travel time using lognormal distribution is superior to other probability functions. In this study, we extend the finite multistate lognormal model of estimating the travel time distribution to unbounded lognormal distribution. In particular, a nonparametric Dirichlet Process Mixture Model (DPMM with stick-breaking process representation was used. The strength of the DPMM is that it can choose the number of components dynamically as part of the algorithm during parameter estimation. To reduce computational complexity, the modeling process was limited to a maximum of six components. Then, the Markov Chain Monte Carlo (MCMC sampling technique was employed to estimate the parameters’ posterior distribution. Speed data from nine links of a freeway corridor, aggregated on a 5-minute basis, were used to calculate the corridor travel time. The results demonstrated that this model offers significant flexibility in modeling to account for complex mixture distributions of the travel time without specifying the number of components. The DPMM modeling further revealed that freeway travel time is characterized by multistate or single-state models depending on the inclusion of onset and offset of congestion periods.

  10. A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems

    Science.gov (United States)

    Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J.

    2017-06-01

    We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.

  11. A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems.

    Science.gov (United States)

    Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J

    2017-06-01

    We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.

  12. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  13. Nonparametric estimation of the heterogeneity of a random medium using compound Poisson process modeling of wave multiple scattering.

    Science.gov (United States)

    Le Bihan, Nicolas; Margerin, Ludovic

    2009-07-01

    In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.

  14. Methodology in robust and nonparametric statistics

    CERN Document Server

    Jurecková, Jana; Picek, Jan

    2012-01-01

    Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators

  15. Economic decision making and the application of nonparametric prediction models

    Science.gov (United States)

    Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.

    2008-01-01

    Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.

  16. Collaborative regression.

    Science.gov (United States)

    Gross, Samuel M; Tibshirani, Robert

    2015-04-01

    We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem that can be implemented with off the shelf solvers, and illustrate their use on simulated and real data. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2009-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  18. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2010-01-01

    In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By

  19. Efficient estimation of an additive quantile regression model

    NARCIS (Netherlands)

    Cheng, Y.; de Gooijer, J.G.; Zerom, D.

    2011-01-01

    In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator

  20. CATDAT - A program for parametric and nonparametric categorical data analysis user's manual, Version 1.0

    International Nuclear Information System (INIS)

    Peterson, James R.; Haas, Timothy C.; Lee, Danny C.

    2000-01-01

    Natural resource professionals are increasingly required to develop rigorous statistical models that relate environmental data to categorical responses data. Recent advances in the statistical and computing sciences have led to the development of sophisticated methods for parametric and nonparametric analysis of data with categorical responses. The statistical software package CATDAT was designed to make some of these relatively new and powerful techniques available to scientists. The CATDAT statistical package includes 4 analytical techniques: generalized logit modeling; binary classification tree; extended K-nearest neighbor classification; and modular neural network

  1. Efficient nonparametric estimation of causal mediation effects

    OpenAIRE

    Chan, K. C. G.; Imai, K.; Yam, S. C. P.; Zhang, Z.

    2016-01-01

    An essential goal of program evaluation and scientific research is the investigation of causal mechanisms. Over the past several decades, causal mediation analysis has been used in medical and social sciences to decompose the treatment effect into the natural direct and indirect effects. However, all of the existing mediation analysis methods rely on parametric modeling assumptions in one way or another, typically requiring researchers to specify multiple regression models involving the treat...

  2. Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

    Science.gov (United States)

    Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.

    2011-05-01

    Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation

  3. Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

    Science.gov (United States)

    Bry, X; Verron, T; Cazes, P

    2009-05-29

    In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.

  4. Ridge Regression Signal Processing

    Science.gov (United States)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  5. Application of nonparametric statistic method for DNBR limit calculation

    International Nuclear Information System (INIS)

    Dong Bo; Kuang Bo; Zhu Xuenong

    2013-01-01

    Background: Nonparametric statistical method is a kind of statistical inference method not depending on a certain distribution; it calculates the tolerance limits under certain probability level and confidence through sampling methods. The DNBR margin is one important parameter of NPP design, which presents the safety level of NPP. Purpose and Methods: This paper uses nonparametric statistical method basing on Wilks formula and VIPER-01 subchannel analysis code to calculate the DNBR design limits (DL) of 300 MW NPP (Nuclear Power Plant) during the complete loss of flow accident, simultaneously compared with the DL of DNBR through means of ITDP to get certain DNBR margin. Results: The results indicate that this method can gain 2.96% DNBR margin more than that obtained by ITDP methodology. Conclusions: Because of the reduction of the conservation during analysis process, the nonparametric statistical method can provide greater DNBR margin and the increase of DNBR margin is benefited for the upgrading of core refuel scheme. (authors)

  6. A nonparametric spatial scan statistic for continuous data.

    Science.gov (United States)

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  7. Bayesian Analysis for Penalized Spline Regression Using WinBUGS

    Directory of Open Access Journals (Sweden)

    Ciprian M. Crainiceanu

    2005-09-01

    Full Text Available Penalized splines can be viewed as BLUPs in a mixed model framework, which allows the use of mixed model software for smoothing. Thus, software originally developed for Bayesian analysis of mixed models can be used for penalized spline regression. Bayesian inference for nonparametric models enjoys the flexibility of nonparametric models and the exact inference provided by the Bayesian inferential machinery. This paper provides a simple, yet comprehensive, set of programs for the implementation of nonparametric Bayesian analysis in WinBUGS. Good mixing properties of the MCMC chains are obtained by using low-rank thin-plate splines, while simulation times per iteration are reduced employing WinBUGS specific computational tricks.

  8. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  9. A BAYESIAN NONPARAMETRIC MIXTURE MODEL FOR SELECTING GENES AND GENE SUBNETWORKS.

    Science.gov (United States)

    Zhao, Yize; Kang, Jian; Yu, Tianwei

    2014-06-01

    It is very challenging to select informative features from tens of thousands of measured features in high-throughput data analysis. Recently, several parametric/regression models have been developed utilizing the gene network information to select genes or pathways strongly associated with a clinical/biological outcome. Alternatively, in this paper, we propose a nonparametric Bayesian model for gene selection incorporating network information. In addition to identifying genes that have a strong association with a clinical outcome, our model can select genes with particular expressional behavior, in which case the regression models are not directly applicable. We show that our proposed model is equivalent to an infinity mixture model for which we develop a posterior computation algorithm based on Markov chain Monte Carlo (MCMC) methods. We also propose two fast computing algorithms that approximate the posterior simulation with good accuracy but relatively low computational cost. We illustrate our methods on simulation studies and the analysis of Spellman yeast cell cycle microarray data.

  10. Health Care Facility Choice and User Fee Abolition: Regression Discontinuity in a Multinomial Choice Setting

    OpenAIRE

    Steven F. Koch; Jeffrey S. Racine

    2013-01-01

    We apply parametric and nonparametric regression discontinuity methodology within a multinomial choice setting to examine the impact of public health care user fee abolition on health facility choice using data from South Africa. The nonparametric model is found to outperform the parametric model both in- and out-of-sample, while also delivering more plausible estimates of the impact of user fee abolition (i.e. the 'treatment effect'). In the parametric framework, treatment effects were relat...

  11. Comparative Analysis of A, B Type and Exchange Traded Funds Performances with Mutual Fund Performance Measures, Regression Analysis and Manova Technique.

    Directory of Open Access Journals (Sweden)

    Mehmet Arslan

    2010-06-01

    Full Text Available The objective of the study is to evaluate risk- reward relationship and relative performances of the 4 different groups of mutual funds. To this end, daily return data of these 12 mutual funds (3 type variable fund; 3 B type variable fund; 3 A type stock fund and 3 A type Exchange traded fund together with daily market index (imkb100 return and daily return of riskless rate for the period from January 2006 to Feb 2010. The 180-day maturity T-Bill has been selected to represent riskless rate. To determine performances of mutual funds; Sharpe ratio, M2 measure, Treynor index, Jensen index, Sortino ratio, T2 ratio, Valuation ratio has been applied and these indicators produced conflicting results in ranking mutual funds. Then timingand selection capability of the fund manager has been determined by applying simple regression and Quadratic regression. Interestingly all funds found to have positive coefficient, indicating positive election capability of managers; but in terms of timing capability only one fund managers showed success. Finally, to determine extent to which mean returns are differs between mutual funds, market index (imkb100 and riskless rate (180 day TBill results of the analysis revealed that mean returns of individual security returns differs at P≤0,01 level. That shows instability in returns and poor ex-ante forecast modeling capability.

  12. Comparative Study of Parametric and Non-parametric Approaches in Fault Detection and Isolation

    DEFF Research Database (Denmark)

    Katebi, S.D.; Blanke, M.; Katebi, M.R.

    This report describes a comparative study between two approaches to fault detection and isolation in dynamic systems. The first approach uses a parametric model of the system. The main components of such techniques are residual and signature generation for processing and analyzing. The second...... approach is non-parametric in the sense that the signature analysis is only dependent on the frequency or time domain information extracted directly from the input-output signals. Based on these approaches, two different fault monitoring schemes are developed where the feature extraction and fault decision...

  13. Short-term forecasting of meteorological time series using Nonparametric Functional Data Analysis (NPFDA)

    Science.gov (United States)

    Curceac, S.; Ternynck, C.; Ouarda, T.

    2015-12-01

    Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed

  14. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    Science.gov (United States)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  15. The nonparametric bootstrap for the current status model

    NARCIS (Netherlands)

    Groeneboom, P.; Hendrickx, K.

    2017-01-01

    It has been proved that direct bootstrapping of the nonparametric maximum likelihood estimator (MLE) of the distribution function in the current status model leads to inconsistent confidence intervals. We show that bootstrapping of functionals of the MLE can however be used to produce valid

  16. Non-Parametric Analysis of Rating Transition and Default Data

    DEFF Research Database (Denmark)

    Fledelius, Peter; Lando, David; Perch Nielsen, Jens

    2004-01-01

    We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...

  17. Bayesian nonparametric system reliability using sets of priors

    NARCIS (Netherlands)

    Walter, G.M.; Aslett, L.J.M.; Coolen, F.P.A.

    2016-01-01

    An imprecise Bayesian nonparametric approach to system reliability with multiple types of components is developed. This allows modelling partial or imperfect prior knowledge on component failure distributions in a flexible way through bounds on the functioning probability. Given component level test

  18. Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.

    Science.gov (United States)

    Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F

    2013-04-01

    In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.

  19. Nonparametric modeling of dynamic functional connectivity in fmri data

    DEFF Research Database (Denmark)

    Nielsen, Søren Føns Vind; Madsen, Kristoffer H.; Røge, Rasmus

    2015-01-01

    dynamic changes. The existing approaches modeling dynamic connectivity have primarily been based on time-windowing the data and k-means clustering. We propose a nonparametric generative model for dynamic FC in fMRI that does not rely on specifying window lengths and number of dynamic states. Rooted...

  20. A general approach to posterior contraction in nonparametric inverse problems

    NARCIS (Netherlands)

    Knapik, Bartek; Salomond, Jean Bernard

    In this paper, we propose a general method to derive an upper bound for the contraction rate of the posterior distribution for nonparametric inverse problems. We present a general theorem that allows us to derive contraction rates for the parameter of interest from contraction rates of the related

  1. Non-parametric analysis of production efficiency of poultry egg ...

    African Journals Online (AJOL)

    Non-parametric analysis of production efficiency of poultry egg farmers in Delta ... analysis of factors affecting the output of poultry farmers showed that stock ... should be put in place for farmers to learn the best farm practices carried out on the ...

  2. Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

    Science.gov (United States)

    Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

    2013-10-01

    Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.

  3. New analysis methods to push the boundaries of diagnostic techniques in the environmental sciences

    International Nuclear Information System (INIS)

    Lungaroni, M.; Peluso, E.; Gelfusa, M.; Malizia, A.; Talebzadeh, S.; Gaudio, P.; Murari, A.; Vega, J.

    2016-01-01

    In the last years, new and more sophisticated measurements have been at the basis of the major progress in various disciplines related to the environment, such as remote sensing and thermonuclear fusion. To maximize the effectiveness of the measurements, new data analysis techniques are required. First data processing tasks, such as filtering and fitting, are of primary importance, since they can have a strong influence on the rest of the analysis. Even if Support Vector Regression is a method devised and refined at the end of the 90s, a systematic comparison with more traditional non parametric regression methods has never been reported. In this paper, a series of systematic tests is described, which indicates how SVR is a very competitive method of non-parametric regression that can usefully complement and often outperform more consolidated approaches. The performance of Support Vector Regression as a method of filtering is investigated first, comparing it with the most popular alternative techniques. Then Support Vector Regression is applied to the problem of non-parametric regression to analyse Lidar surveys for the environments measurement of particulate matter due to wildfires. The proposed approach has given very positive results and provides new perspectives to the interpretation of the data.

  4. Subset selection in regression

    CERN Document Server

    Miller, Alan

    2002-01-01

    Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...

  5. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  6. Improving salt marsh digital elevation model accuracy with full-waveform lidar and nonparametric predictive modeling

    Science.gov (United States)

    Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.

    2018-03-01

    Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.

  7. Two-Stage Method Based on Local Polynomial Fitting for a Linear Heteroscedastic Regression Model and Its Application in Economics

    Directory of Open Access Journals (Sweden)

    Liyun Su

    2012-01-01

    Full Text Available We introduce the extension of local polynomial fitting to the linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to nonparametric technique of local polynomial estimation, we do not need to know the heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we focus on comparison of parameters and reach an optimal fitting. Besides, we verify the asymptotic normality of parameters based on numerical simulations. Finally, this approach is applied to a case of economics, and it indicates that our method is surely effective in finite-sample situations.

  8. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    Science.gov (United States)

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  9. Identification of System Parameters by the Random Decrement Technique

    DEFF Research Database (Denmark)

    Brincker, Rune; Kirkegaard, Poul Henning; Rytter, Anders

    1991-01-01

    -Walker equations and finally, least-square fitting of the theoretical correlation function. The results are compared to the results of fitting an Auto Regressive Moving Average (ARMA) model directly to the system output from a single-degree-of-freedom system loaded by white noise.......The aim of this paper is to investigate and illustrate the possibilities of using correlation functions estimated by the Random Decrement Technique as a basis for parameter identification. A two-stage system identification system is used: first, the correlation functions are estimated by the Random...... Decrement Technique, and then the system parameters are identified from the correlation function estimates. Three different techniques are used in the parameter identification process: a simple non-parametric method, estimation of an Auto Regressive (AR) model by solving an overdetermined set of Yule...

  10. Using Spline Regression in Semi-Parametric Stochastic Frontier Analysis: An Application to Polish Dairy Farms

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...... efficiency of Polish dairy farms contributes to the insight into this dynamic process. Furthermore, we compare and evaluate the results of this spline-based semi-parametric stochastic frontier model with results of other semi-parametric stochastic frontier models and of traditional parametric stochastic...

  11. Abstract Expression Grammar Symbolic Regression

    Science.gov (United States)

    Korns, Michael F.

    This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.

  12. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  13. Comparação de duas metodologias de amostragem atmosférica com ferramenta estatística não paramétrica Comparison of two atmospheric sampling methodologies with non-parametric statistical tools

    Directory of Open Access Journals (Sweden)

    Maria João Nunes

    2005-03-01

    Full Text Available In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.

  14. A Powerful Test for Comparing Multiple Regression Functions.

    Science.gov (United States)

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  15. Hadron energy reconstruction for the ATLAS calorimetry in the framework of the nonparametrical method

    CERN Document Server

    Akhmadaliev, S Z; Ambrosini, G; Amorim, A; Anderson, K; Andrieux, M L; Aubert, Bernard; Augé, E; Badaud, F; Baisin, L; Barreiro, F; Battistoni, G; Bazan, A; Bazizi, K; Belymam, A; Benchekroun, D; Berglund, S R; Berset, J C; Blanchot, G; Bogush, A A; Bohm, C; Boldea, V; Bonivento, W; Bosman, M; Bouhemaid, N; Breton, D; Brette, P; Bromberg, C; Budagov, Yu A; Burdin, S V; Calôba, L P; Camarena, F; Camin, D V; Canton, B; Caprini, M; Carvalho, J; Casado, M P; Castillo, M V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Chadelas, R; Chalifour, M; Chekhtman, A; Chevalley, J L; Chirikov-Zorin, I E; Chlachidze, G; Citterio, M; Cleland, W E; Clément, C; Cobal, M; Cogswell, F; Colas, Jacques; Collot, J; Cologna, S; Constantinescu, S; Costa, G; Costanzo, D; Crouau, M; Daudon, F; David, J; David, M; Davidek, T; Dawson, J; De, K; de La Taille, C; Del Peso, J; Del Prete, T; de Saintignon, P; Di Girolamo, B; Dinkespiler, B; Dita, S; Dodd, J; Dolejsi, J; Dolezal, Z; Downing, R; Dugne, J J; Dzahini, D; Efthymiopoulos, I; Errede, D; Errede, S; Evans, H; Eynard, G; Fassi, F; Fassnacht, P; Ferrari, A; Ferrer, A; Flaminio, Vincenzo; Fournier, D; Fumagalli, G; Gallas, E; Gaspar, M; Giakoumopoulou, V; Gianotti, F; Gildemeister, O; Giokaris, N; Glagolev, V; Glebov, V Yu; Gomes, A; González, V; González de la Hoz, S; Grabskii, V; Graugès-Pous, E; Grenier, P; Hakopian, H H; Haney, M; Hébrard, C; Henriques, A; Hervás, L; Higón, E; Holmgren, Sven Olof; Hostachy, J Y; Hoummada, A; Huston, J; Imbault, D; Ivanyushenkov, Yu M; Jézéquel, S; Johansson, E K; Jon-And, K; Jones, R; Juste, A; Kakurin, S; Karyukhin, A N; Khokhlov, Yu A; Khubua, J I; Klioukhine, V I; Kolachev, G M; Kopikov, S V; Kostrikov, M E; Kozlov, V; Krivkova, P; Kukhtin, V V; Kulagin, M; Kulchitskii, Yu A; Kuzmin, M V; Labarga, L; Laborie, G; Lacour, D; Laforge, B; Lami, S; Lapin, V; Le Dortz, O; Lefebvre, M; Le Flour, T; Leitner, R; Leltchouk, M; Li, J; Liablin, M V; Linossier, O; Lissauer, D; Lobkowicz, F; Lokajícek, M; Lomakin, Yu F; López-Amengual, J M; Lund-Jensen, B; Maio, A; Makowiecki, D S; Malyukov, S N; Mandelli, L; Mansoulié, B; Mapelli, Livio P; Marin, C P; Marrocchesi, P S; Marroquim, F; Martin, P; Maslennikov, A L; Massol, N; Mataix, L; Mazzanti, M; Mazzoni, E; Merritt, F S; Michel, B; Miller, R; Minashvili, I A; Miralles, L; Mnatzakanian, E A; Monnier, E; Montarou, G; Mornacchi, Giuseppe; Moynot, M; Muanza, G S; Nayman, P; Némécek, S; Nessi, Marzio; Nicoleau, S; Niculescu, M; Noppe, J M; Onofre, A; Pallin, D; Pantea, D; Paoletti, R; Park, I C; Parrour, G; Parsons, J; Pereira, A; Perini, L; Perlas, J A; Perrodo, P; Pilcher, J E; Pinhão, J; Plothow-Besch, Hartmute; Poggioli, Luc; Poirot, S; Price, L; Protopopov, Yu; Proudfoot, J; Puzo, P; Radeka, V; Rahm, David Charles; Reinmuth, G; Renzoni, G; Rescia, S; Resconi, S; Richards, R; Richer, J P; Roda, C; Rodier, S; Roldán, J; Romance, J B; Romanov, V; Romero, P; Rossel, F; Rusakovitch, N A; Sala, P; Sanchis, E; Sanders, H; Santoni, C; Santos, J; Sauvage, D; Sauvage, G; Sawyer, L; Says, L P; Schaffer, A C; Schwemling, P; Schwindling, J; Seguin-Moreau, N; Seidl, W; Seixas, J M; Selldén, B; Seman, M; Semenov, A; Serin, L; Shaldaev, E; Shochet, M J; Sidorov, V; Silva, J; Simaitis, V J; Simion, S; Sissakian, A N; Snopkov, R; Söderqvist, J; Solodkov, A A; Soloviev, A; Soloviev, I V; Sonderegger, P; Soustruznik, K; Spanó, F; Spiwoks, R; Stanek, R; Starchenko, E A; Stavina, P; Stephens, R; Suk, M; Surkov, A; Sykora, I; Takai, H; Tang, F; Tardell, S; Tartarelli, F; Tas, P; Teiger, J; Thaler, J; Thion, J; Tikhonov, Yu A; Tisserant, S; Tokar, S; Topilin, N D; Trka, Z; Turcotte, M; Valkár, S; Varanda, M J; Vartapetian, A H; Vazeille, F; Vichou, I; Vinogradov, V; Vorozhtsov, S B; Vuillemin, V; White, A; Wielers, M; Wingerter-Seez, I; Wolters, H; Yamdagni, N; Yosef, C; Zaitsev, A; Zitoun, R; Zolnierowski, Y

    2002-01-01

    This paper discusses hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter (consisting of a lead-liquid argon electromagnetic part and an iron-scintillator hadronic part) in the framework of the nonparametrical method. The nonparametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to an easy use in a first level trigger. The reconstructed mean values of the hadron energies are within +or-1% of the true values and the fractional energy resolution is [(58+or-3)%/ square root E+(2.5+or-0.3)%](+)(1.7+or-0.2)/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74+or-0.04 and agrees with the prediction that e/h >1.66 for this electromagnetic calorimeter. Results of a study of the longitudinal hadronic shower development are also presented. The data have been taken in the H8 beam...

  16. Impulse response identification with deterministic inputs using non-parametric methods

    International Nuclear Information System (INIS)

    Bhargava, U.K.; Kashyap, R.L.; Goodman, D.M.

    1985-01-01

    This paper addresses the problem of impulse response identification using non-parametric methods. Although the techniques developed herein apply to the truncated, untruncated, and the circulant models, we focus on the truncated model which is useful in certain applications. Two methods of impulse response identification will be presented. The first is based on the minimization of the C/sub L/ Statistic, which is an estimate of the mean-square prediction error; the second is a Bayesian approach. For both of these methods, we consider the effects of using both the identity matrix and the Laplacian matrix as weights on the energy in the impulse response. In addition, we present a method for estimating the effective length of the impulse response. Estimating the length is particularly important in the truncated case. Finally, we develop a method for estimating the noise variance at the output. Often, prior information on the noise variance is not available, and a good estimate is crucial to the success of estimating the impulse response with a nonparametric technique

  17. Tremor Detection Using Parametric and Non-Parametric Spectral Estimation Methods: A Comparison with Clinical Assessment

    Science.gov (United States)

    Martinez Manzanera, Octavio; Elting, Jan Willem; van der Hoeven, Johannes H.; Maurits, Natasha M.

    2016-01-01

    In the clinic, tremor is diagnosed during a time-limited process in which patients are observed and the characteristics of tremor are visually assessed. For some tremor disorders, a more detailed analysis of these characteristics is needed. Accelerometry and electromyography can be used to obtain a better insight into tremor. Typically, routine clinical assessment of accelerometry and electromyography data involves visual inspection by clinicians and occasionally computational analysis to obtain objective characteristics of tremor. However, for some tremor disorders these characteristics may be different during daily activity. This variability in presentation between the clinic and daily life makes a differential diagnosis more difficult. A long-term recording of tremor by accelerometry and/or electromyography in the home environment could help to give a better insight into the tremor disorder. However, an evaluation of such recordings using routine clinical standards would take too much time. We evaluated a range of techniques that automatically detect tremor segments in accelerometer data, as accelerometer data is more easily obtained in the home environment than electromyography data. Time can be saved if clinicians only have to evaluate the tremor characteristics of segments that have been automatically detected in longer daily activity recordings. We tested four non-parametric methods and five parametric methods on clinical accelerometer data from 14 patients with different tremor disorders. The consensus between two clinicians regarding the presence or absence of tremor on 3943 segments of accelerometer data was employed as reference. The nine methods were tested against this reference to identify their optimal parameters. Non-parametric methods generally performed better than parametric methods on our dataset when optimal parameters were used. However, one parametric method, employing the high frequency content of the tremor bandwidth under consideration

  18. Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.

    Science.gov (United States)

    Deshwar, Amit G; Vembu, Shankar; Morris, Quaid

    2015-01-01

    Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…

  19. Single versus mixture Weibull distributions for nonparametric satellite reliability

    International Nuclear Information System (INIS)

    Castet, Jean-Francois; Saleh, Joseph H.

    2010-01-01

    Long recognized as a critical design attribute for space systems, satellite reliability has not yet received the proper attention as limited on-orbit failure data and statistical analyses can be found in the technical literature. To fill this gap, we recently conducted a nonparametric analysis of satellite reliability for 1584 Earth-orbiting satellites launched between January 1990 and October 2008. In this paper, we provide an advanced parametric fit, based on mixture of Weibull distributions, and compare it with the single Weibull distribution model obtained with the Maximum Likelihood Estimation (MLE) method. We demonstrate that both parametric fits are good approximations of the nonparametric satellite reliability, but that the mixture Weibull distribution provides significant accuracy in capturing all the failure trends in the failure data, as evidenced by the analysis of the residuals and their quasi-normal dispersion.

  20. International Conference on Robust Rank-Based and Nonparametric Methods

    CERN Document Server

    McKean, Joseph

    2016-01-01

    The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...

  1. Seismic Signal Compression Using Nonparametric Bayesian Dictionary Learning via Clustering

    Directory of Open Access Journals (Sweden)

    Xin Tian

    2017-06-01

    Full Text Available We introduce a seismic signal compression method based on nonparametric Bayesian dictionary learning method via clustering. The seismic data is compressed patch by patch, and the dictionary is learned online. Clustering is introduced for dictionary learning. A set of dictionaries could be generated, and each dictionary is used for one cluster’s sparse coding. In this way, the signals in one cluster could be well represented by their corresponding dictionaries. A nonparametric Bayesian dictionary learning method is used to learn the dictionaries, which naturally infers an appropriate dictionary size for each cluster. A uniform quantizer and an adaptive arithmetic coding algorithm are adopted to code the sparse coefficients. With comparisons to other state-of-the art approaches, the effectiveness of the proposed method could be validated in the experiments.

  2. Nonparametric Bayesian models through probit stick-breaking processes.

    Science.gov (United States)

    Rodríguez, Abel; Dunson, David B

    2011-03-01

    We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.

  3. Exact nonparametric inference for detection of nonlinear determinism

    OpenAIRE

    Luo, Xiaodong; Zhang, Jie; Small, Michael; Moroz, Irene

    2005-01-01

    We propose an exact nonparametric inference scheme for the detection of nonlinear determinism. The essential fact utilized in our scheme is that, for a linear stochastic process with jointly symmetric innovations, its ordinary least square (OLS) linear prediction error is symmetric about zero. Based on this viewpoint, a class of linear signed rank statistics, e.g. the Wilcoxon signed rank statistic, can be derived with the known null distributions from the prediction error. Thus one of the ad...

  4. Non-parametric estimation of the individual's utility map

    OpenAIRE

    Noguchi, Takao; Sanborn, Adam N.; Stewart, Neil

    2013-01-01

    Models of risky choice have attracted much attention in behavioural economics. Previous research has repeatedly demonstrated that individuals' choices are not well explained by expected utility theory, and a number of alternative models have been examined using carefully selected sets of choice alternatives. The model performance however, can depend on which choice alternatives are being tested. Here we develop a non-parametric method for estimating the utility map over the wide range of choi...

  5. Nonparametric Efficiency Testing of Asian Stock Markets Using Weekly Data

    OpenAIRE

    CORNELIS A. LOS

    2004-01-01

    The efficiency of speculative markets, as represented by Fama's 1970 fair game model, is tested on weekly price index data of six Asian stock markets - Hong Kong, Indonesia, Malaysia, Singapore, Taiwan and Thailand - using Sherry's (1992) non-parametric methods. These scientific testing methods were originally developed to analyze the information processing efficiency of nervous systems. In particular, the stationarity and independence of the price innovations are tested over ten years, from ...

  6. Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.

    Science.gov (United States)

    Zhang, Tingting; Kou, S C

    2010-01-01

    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.

  7. A non-parametric consistency test of the ΛCDM model with Planck CMB data

    Energy Technology Data Exchange (ETDEWEB)

    Aghamousa, Amir; Shafieloo, Arman [Korea Astronomy and Space Science Institute, Daejeon 305-348 (Korea, Republic of); Hamann, Jan, E-mail: amir@aghamousa.com, E-mail: jan.hamann@unsw.edu.au, E-mail: shafieloo@kasi.re.kr [School of Physics, The University of New South Wales, Sydney NSW 2052 (Australia)

    2017-09-01

    Non-parametric reconstruction methods, such as Gaussian process (GP) regression, provide a model-independent way of estimating an underlying function and its uncertainty from noisy data. We demonstrate how GP-reconstruction can be used as a consistency test between a given data set and a specific model by looking for structures in the residuals of the data with respect to the model's best-fit. Applying this formalism to the Planck temperature and polarisation power spectrum measurements, we test their global consistency with the predictions of the base ΛCDM model. Our results do not show any serious inconsistencies, lending further support to the interpretation of the base ΛCDM model as cosmology's gold standard.

  8. Investigation of MLE in nonparametric estimation methods of reliability function

    International Nuclear Information System (INIS)

    Ahn, Kwang Won; Kim, Yoon Ik; Chung, Chang Hyun; Kim, Kil Yoo

    2001-01-01

    There have been lots of trials to estimate a reliability function. In the ESReDA 20 th seminar, a new method in nonparametric way was proposed. The major point of that paper is how to use censored data efficiently. Generally there are three kinds of approach to estimate a reliability function in nonparametric way, i.e., Reduced Sample Method, Actuarial Method and Product-Limit (PL) Method. The above three methods have some limits. So we suggest an advanced method that reflects censored information more efficiently. In many instances there will be a unique maximum likelihood estimator (MLE) of an unknown parameter, and often it may be obtained by the process of differentiation. It is well known that the three methods generally used to estimate a reliability function in nonparametric way have maximum likelihood estimators that are uniquely exist. So, MLE of the new method is derived in this study. The procedure to calculate a MLE is similar just like that of PL-estimator. The difference of the two is that in the new method, the mass (or weight) of each has an influence of the others but the mass in PL-estimator not

  9. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  10. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  11. Non-parametric adaptive importance sampling for the probability estimation of a launcher impact position

    International Nuclear Information System (INIS)

    Morio, Jerome

    2011-01-01

    Importance sampling (IS) is a useful simulation technique to estimate critical probability with a better accuracy than Monte Carlo methods. It consists in generating random weighted samples from an auxiliary distribution rather than the distribution of interest. The crucial part of this algorithm is the choice of an efficient auxiliary PDF that has to be able to simulate more rare random events. The optimisation of this auxiliary distribution is often in practice very difficult. In this article, we propose to approach the IS optimal auxiliary density with non-parametric adaptive importance sampling (NAIS). We apply this technique for the probability estimation of spatial launcher impact position since it has currently become a more and more important issue in the field of aeronautics.

  12. Measuring energy performance with sectoral heterogeneity: A non-parametric frontier approach

    International Nuclear Information System (INIS)

    Wang, H.; Ang, B.W.; Wang, Q.W.; Zhou, P.

    2017-01-01

    Evaluating economy-wide energy performance is an integral part of assessing the effectiveness of a country's energy efficiency policy. Non-parametric frontier approach has been widely used by researchers for such a purpose. This paper proposes an extended non-parametric frontier approach to studying economy-wide energy efficiency and productivity performances by accounting for sectoral heterogeneity. Relevant techniques in index number theory are incorporated to quantify the driving forces behind changes in the economy-wide energy productivity index. The proposed approach facilitates flexible modelling of different sectors' production processes, and helps to examine sectors' impact on the aggregate energy performance. A case study of China's economy-wide energy efficiency and productivity performances in its 11th five-year plan period (2006–2010) is presented. It is found that sectoral heterogeneities in terms of energy performance are significant in China. Meanwhile, China's economy-wide energy productivity increased slightly during the study period, mainly driven by the technical efficiency improvement. A number of other findings have also been reported. - Highlights: • We model economy-wide energy performance by considering sectoral heterogeneity. • The proposed approach can identify sectors' impact on the aggregate energy performance. • Obvious sectoral heterogeneities are identified in evaluating China's energy performance.

  13. MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO

    International Nuclear Information System (INIS)

    Jardel, John R.; Gebhardt, Karl; Fabricius, Maximilian H.; Williams, Michael J.; Drory, Niv

    2013-01-01

    We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Our non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 ≤ r ≤ 700 pc. The profile for r ≥ 20 pc is well fit by a power law with slope α = –1.0 ± 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.

  14. Nonparametric Forecasting for Biochar Utilization in Poyang Lake Eco-Economic Zone in China

    Directory of Open Access Journals (Sweden)

    Meng-Shiuh Chang

    2014-01-01

    Full Text Available Agriculture is the least profitable industry in China. However, even with large financial subsidies from the government, farmers’ living standards have had no significant impact so far due to the historical, geographical, climatic factors. The study examines and quantifies the net economic and environmental benefits by utilizing biochar as a soil amendment in eleven counties in the Poyang Lake Eco-Economic Zone. A nonparametric kernel regression model is employed to estimate the relation between the scaled environmental and economic factors, which are determined as regression variables. In addition, the partial linear and single index regression models are used for comparison. In terms of evaluations of mean squared errors, the kernel estimator, exceeding the other estimators, is employed to forecast benefits of using biochar under various scenarios. The results indicate that biochar utilization can potentially increase farmers’ income if rice is planted and the net economic benefits can be achieved up to ¥114,900. The net economic benefits are higher when the pyrolysis plant is built in the south of Poyang Lake Eco-Economic Zone than when it is built in the north as the southern land is relatively barren, and biochar can save more costs on irrigation and fertilizer use.

  15. Integration of association statistics over genomic regions using Bayesian adaptive regression splines

    Directory of Open Access Journals (Sweden)

    Zhang Xiaohua

    2003-11-01

    Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.

  16. Nonparametric identification of nonlinear dynamic systems using a synchronisation-based method

    Science.gov (United States)

    Kenderi, Gábor; Fidlin, Alexander

    2014-12-01

    The present study proposes an identification method for highly nonlinear mechanical systems that does not require a priori knowledge of the underlying nonlinearities to reconstruct arbitrary restoring force surfaces between degrees of freedom. This approach is based on the master-slave synchronisation between a dynamic model of the system as the slave and the real system as the master using measurements of the latter. As the model synchronises to the measurements, it becomes an observer of the real system. The optimal observer algorithm in a least-squares sense is given by the Kalman filter. Using the well-known state augmentation technique, the Kalman filter can be turned into a dual state and parameter estimator to identify parameters of a priori characterised nonlinearities. The paper proposes an extension of this technique towards nonparametric identification. A general system model is introduced by describing the restoring forces as bilateral spring-dampers with time-variant coefficients, which are estimated as augmented states. The estimation procedure is followed by an a posteriori statistical analysis to reconstruct noise-free restoring force characteristics using the estimated states and their estimated variances. Observability is provided using only one measured mechanical quantity per degree of freedom, which makes this approach less demanding in the number of necessary measurement signals compared with truly nonparametric solutions, which typically require displacement, velocity and acceleration signals. Additionally, due to the statistical rigour of the procedure, it successfully addresses signals corrupted by significant measurement noise. In the present paper, the method is described in detail, which is followed by numerical examples of one degree of freedom (1DoF) and 2DoF mechanical systems with strong nonlinearities of vibro-impact type to demonstrate the effectiveness of the proposed technique.

  17. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  18. STATCAT, Statistical Analysis of Parametric and Non-Parametric Data

    International Nuclear Information System (INIS)

    David, Hugh

    1990-01-01

    1 - Description of program or function: A suite of 26 programs designed to facilitate the appropriate statistical analysis and data handling of parametric and non-parametric data, using classical and modern univariate and multivariate methods. 2 - Method of solution: Data is read entry by entry, using a choice of input formats, and the resultant data bank is checked for out-of- range, rare, extreme or missing data. The completed STATCAT data bank can be treated by a variety of descriptive and inferential statistical methods, and modified, using other standard programs as required

  19. Digital spectral analysis parametric, non-parametric and advanced methods

    CERN Document Server

    Castanié, Francis

    2013-01-01

    Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a

  20. Nonparametric statistics a step-by-step approach

    CERN Document Server

    Corder, Gregory W

    2014-01-01

    "…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory.  It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca

  1. Evaluation of Nonparametric Probabilistic Forecasts of Wind Power

    DEFF Research Database (Denmark)

    Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008

    Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...... likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...

  2. Estimation of Stochastic Volatility Models by Nonparametric Filtering

    DEFF Research Database (Denmark)

    Kanaya, Shin; Kristensen, Dennis

    2016-01-01

    /estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...

  3. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  4. Nonparametric estimation of age-specific reference percentile curves with radial smoothing.

    Science.gov (United States)

    Wan, Xiaohai; Qu, Yongming; Huang, Yao; Zhang, Xiao; Song, Hanping; Jiang, Honghua

    2012-01-01

    Reference percentile curves represent the covariate-dependent distribution of a quantitative measurement and are often used to summarize and monitor dynamic processes such as human growth. We propose a new nonparametric method based on a radial smoothing (RS) technique to estimate age-specific reference percentile curves assuming the underlying distribution is relatively close to normal. We compared the RS method with both the LMS and the generalized additive models for location, scale and shape (GAMLSS) methods using simulated data and found that our method has smaller estimation error than the two existing methods. We also applied the new method to analyze height growth data from children being followed in a clinical observational study of growth hormone treatment, and compared the growth curves between those with growth disorders and the general population. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Bayesian nonparametric dictionary learning for compressed sensing MRI.

    Science.gov (United States)

    Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping

    2014-12-01

    We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.

  6. 1st Conference of the International Society for Nonparametric Statistics

    CERN Document Server

    Lahiri, S; Politis, Dimitris

    2014-01-01

    This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world.   The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...

  7. Application of nonparametric statistics to material strength/reliability assessment

    International Nuclear Information System (INIS)

    Arai, Taketoshi

    1992-01-01

    An advanced material technology requires data base on a wide variety of material behavior which need to be established experimentally. It may often happen that experiments are practically limited in terms of reproducibility or a range of test parameters. Statistical methods can be applied to understanding uncertainties in such a quantitative manner as required from the reliability point of view. Statistical assessment involves determinations of a most probable value and the maximum and/or minimum value as one-sided or two-sided confidence limit. A scatter of test data can be approximated by a theoretical distribution only if the goodness of fit satisfies a test criterion. Alternatively, nonparametric statistics (NPS) or distribution-free statistics can be applied. Mathematical procedures by NPS are well established for dealing with most reliability problems. They handle only order statistics of a sample. Mathematical formulas and some applications to engineering assessments are described. They include confidence limits of median, population coverage of sample, required minimum number of a sample, and confidence limits of fracture probability. These applications demonstrate that a nonparametric statistical estimation is useful in logical decision making in the case a large uncertainty exists. (author)

  8. Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression

    Science.gov (United States)

    Chilenski, M. A.; Greenwald, M. J.; Hubbard, A. E.; Hughes, J. W.; Lee, J. P.; Marzouk, Y. M.; Rice, J. E.; White, A. E.

    2017-12-01

    It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al 2013 Phys. Plasmas 20 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al 2015 Plasma Phys. Control. Fusion 57 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression. This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of n{e} and T{e} between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.

  9. Verification of helical tomotherapy delivery using autoassociative kernel regression

    International Nuclear Information System (INIS)

    Seibert, Rebecca M.; Ramsey, Chester R.; Garvey, Dustin R.; Wesley Hines, J.; Robison, Ben H.; Outten, Samuel S.

    2007-01-01

    Quality assurance (QA) is a topic of major concern in the field of intensity modulated radiation therapy (IMRT). The standard of practice for IMRT is to perform QA testing for individual patients to verify that the dose distribution will be delivered to the patient. The purpose of this study was to develop a new technique that could eventually be used to automatically evaluate helical tomotherapy treatments during delivery using exit detector data. This technique uses an autoassociative kernel regression (AAKR) model to detect errors in tomotherapy delivery. AAKR is a novel nonparametric model that is known to predict a group of correct sensor values when supplied a group of sensor values that is usually corrupted or contains faults such as machine failure. This modeling scheme is especially suited for the problem of monitoring the fluence values found in the exit detector data because it is able to learn the complex detector data relationships. This scheme still applies when detector data are summed over many frames with a low temporal resolution and a variable beam attenuation resulting from patient movement. Delivery sequences from three archived patients (prostate, lung, and head and neck) were used in this study. Each delivery sequence was modified by reducing the opening time for random individual multileaf collimator (MLC) leaves by random amounts. The error and error-free treatments were delivered with different phantoms in the path of the beam. Multiple autoassociative kernel regression (AAKR) models were developed and tested by the investigators using combinations of the stored exit detector data sets from each delivery. The models proved robust and were able to predict the correct or error-free values for a projection, which had a single MLC leaf decrease its opening time by less than 10 msec. The model also was able to determine machine output errors. The average uncertainty value for the unfaulted projections ranged from 0.4% to 1.8% of the detector

  10. Statistical learning from a regression perspective

    CERN Document Server

    Berk, Richard A

    2016-01-01

    This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...

  11. Nonparametric Tree-Based Predictive Modeling of Storm Outages on an Electric Distribution Network.

    Science.gov (United States)

    He, Jichao; Wanik, David W; Hartman, Brian M; Anagnostou, Emmanouil N; Astitha, Marina; Frediani, Maria E B

    2017-03-01

    This article compares two nonparametric tree-based models, quantile regression forests (QRF) and Bayesian additive regression trees (BART), for predicting storm outages on an electric distribution network in Connecticut, USA. We evaluated point estimates and prediction intervals of outage predictions for both models using high-resolution weather, infrastructure, and land use data for 89 storm events (including hurricanes, blizzards, and thunderstorms). We found that spatially BART predicted more accurate point estimates than QRF. However, QRF produced better prediction intervals for high spatial resolutions (2-km grid cells and towns), while BART predictions aggregated to coarser resolutions (divisions and service territory) more effectively. We also found that the predictive accuracy was dependent on the season (e.g., tree-leaf condition, storm characteristics), and that the predictions were most accurate for winter storms. Given the merits of each individual model, we suggest that BART and QRF be implemented together to show the complete picture of a storm's potential impact on the electric distribution network, which would allow for a utility to make better decisions about allocating prestorm resources. © 2016 Society for Risk Analysis.

  12. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  13. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection of n...

  14. Oscillometric blood pressure estimation by combining nonparametric bootstrap with Gaussian mixture model.

    Science.gov (United States)

    Lee, Soojeong; Rajan, Sreeraman; Jeon, Gwanggil; Chang, Joon-Hyuk; Dajani, Hilmi R; Groza, Voicu Z

    2017-06-01

    Blood pressure (BP) is one of the most important vital indicators and plays a key role in determining the cardiovascular activity of patients. This paper proposes a hybrid approach consisting of nonparametric bootstrap (NPB) and machine learning techniques to obtain the characteristic ratios (CR) used in the blood pressure estimation algorithm to improve the accuracy of systolic blood pressure (SBP) and diastolic blood pressure (DBP) estimates and obtain confidence intervals (CI). The NPB technique is used to circumvent the requirement for large sample set for obtaining the CI. A mixture of Gaussian densities is assumed for the CRs and Gaussian mixture model (GMM) is chosen to estimate the SBP and DBP ratios. The K-means clustering technique is used to obtain the mixture order of the Gaussian densities. The proposed approach achieves grade "A" under British Society of Hypertension testing protocol and is superior to the conventional approach based on maximum amplitude algorithm (MAA) that uses fixed CR ratios. The proposed approach also yields a lower mean error (ME) and the standard deviation of the error (SDE) in the estimates when compared to the conventional MAA method. In addition, CIs obtained through the proposed hybrid approach are also narrower with a lower SDE. The proposed approach combining the NPB technique with the GMM provides a methodology to derive individualized characteristic ratio. The results exhibit that the proposed approach enhances the accuracy of SBP and DBP estimation and provides narrower confidence intervals for the estimates. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  16. Correlation and simple linear regression.

    Science.gov (United States)

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  17. Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing

    DEFF Research Database (Denmark)

    Hald, Ditte Høvenhoff

    The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...... instantaneous and convolutive mixing, and the inferred temporal patterns. Spatial maps are seen to capture smooth and localized stimuli-related components, and often identifiable noise components. The implementation is freely available as a GUI/SPM plugin, and we recommend using GPICA as an additional tool when...

  18. Nonparametric Estimation of Distributions in Random Effects Models

    KAUST Repository

    Hart, Jeffrey D.

    2011-01-01

    We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.

  19. Prior processes and their applications nonparametric Bayesian estimation

    CERN Document Server

    Phadia, Eswar G

    2016-01-01

    This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...

  20. Nonparametric estimation of stochastic differential equations with sparse Gaussian processes.

    Science.gov (United States)

    García, Constantino A; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G

    2017-08-01

    The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.

  1. Debt and growth: A non-parametric approach

    Science.gov (United States)

    Brida, Juan Gabriel; Gómez, David Matesanz; Seijas, Maria Nela

    2017-11-01

    In this study, we explore the dynamic relationship between public debt and economic growth by using a non-parametric approach based on data symbolization and clustering methods. The study uses annual data of general government consolidated gross debt-to-GDP ratio and gross domestic product for sixteen countries between 1977 and 2015. Using symbolic sequences, we introduce a notion of distance between the dynamical paths of different countries. Then, a Minimal Spanning Tree and a Hierarchical Tree are constructed from time series to help detecting the existence of groups of countries sharing similar economic performance. The main finding of the study appears for the period 2008-2016 when several countries surpassed the 90% debt-to-GDP threshold. During this period, three groups (clubs) of countries are obtained: high, mid and low indebted countries, suggesting that the employed debt-to-GDP threshold drives economic dynamics for the selected countries.

  2. Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees

    Directory of Open Access Journals (Sweden)

    Savic Vladimir

    2010-01-01

    Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.

  3. Exact nonparametric confidence bands for the survivor function.

    Science.gov (United States)

    Matthews, David

    2013-10-12

    A method to produce exact simultaneous confidence bands for the empirical cumulative distribution function that was first described by Owen, and subsequently corrected by Jager and Wellner, is the starting point for deriving exact nonparametric confidence bands for the survivor function of any positive random variable. We invert a nonparametric likelihood test of uniformity, constructed from the Kaplan-Meier estimator of the survivor function, to obtain simultaneous lower and upper bands for the function of interest with specified global confidence level. The method involves calculating a null distribution and associated critical value for each observed sample configuration. However, Noe recursions and the Van Wijngaarden-Decker-Brent root-finding algorithm provide the necessary tools for efficient computation of these exact bounds. Various aspects of the effect of right censoring on these exact bands are investigated, using as illustrations two observational studies of survival experience among non-Hodgkin's lymphoma patients and a much larger group of subjects with advanced lung cancer enrolled in trials within the North Central Cancer Treatment Group. Monte Carlo simulations confirm the merits of the proposed method of deriving simultaneous interval estimates of the survivor function across the entire range of the observed sample. This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada. It was begun while the author was visiting the Department of Statistics, University of Auckland, and completed during a subsequent sojourn at the Medical Research Council Biostatistics Unit in Cambridge. The support of both institutions, in addition to that of NSERC and the University of Waterloo, is greatly appreciated.

  4. Hyperspectral image segmentation using a cooperative nonparametric approach

    Science.gov (United States)

    Taher, Akar; Chehdi, Kacem; Cariou, Claude

    2013-10-01

    In this paper a new unsupervised nonparametric cooperative and adaptive hyperspectral image segmentation approach is presented. The hyperspectral images are partitioned band by band in parallel and intermediate classification results are evaluated and fused, to get the final segmentation result. Two unsupervised nonparametric segmentation methods are used in parallel cooperation, namely the Fuzzy C-means (FCM) method, and the Linde-Buzo-Gray (LBG) algorithm, to segment each band of the image. The originality of the approach relies firstly on its local adaptation to the type of regions in an image (textured, non-textured), and secondly on the introduction of several levels of evaluation and validation of intermediate segmentation results before obtaining the final partitioning of the image. For the management of similar or conflicting results issued from the two classification methods, we gradually introduced various assessment steps that exploit the information of each spectral band and its adjacent bands, and finally the information of all the spectral bands. In our approach, the detected textured and non-textured regions are treated separately from feature extraction step, up to the final classification results. This approach was first evaluated on a large number of monocomponent images constructed from the Brodatz album. Then it was evaluated on two real applications using a respectively multispectral image for Cedar trees detection in the region of Baabdat (Lebanon) and a hyperspectral image for identification of invasive and non invasive vegetation in the region of Cieza (Spain). A correct classification rate (CCR) for the first application is over 97% and for the second application the average correct classification rate (ACCR) is over 99%.

  5. Examination of influential observations in penalized spline regression

    Science.gov (United States)

    Türkan, Semra

    2013-10-01

    In parametric or nonparametric regression models, the results of regression analysis are affected by some anomalous observations in the data set. Thus, detection of these observations is one of the major steps in regression analysis. These observations are precisely detected by well-known influence measures. Pena's statistic is one of them. In this study, Pena's approach is formulated for penalized spline regression in terms of ordinary residuals and leverages. The real data and artificial data are used to see illustrate the effectiveness of Pena's statistic as to Cook's distance on detecting influential observations. The results of the study clearly reveal that the proposed measure is superior to Cook's Distance to detect these observations in large data set.

  6. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    Science.gov (United States)

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Performance of non-parametric algorithms for spatial mapping of tropical forest structure

    Directory of Open Access Journals (Sweden)

    Liang Xu

    2016-08-01

    Full Text Available Abstract Background Mapping tropical forest structure is a critical requirement for accurate estimation of emissions and removals from land use activities. With the availability of a wide range of remote sensing imagery of vegetation characteristics from space, development of finer resolution and more accurate maps has advanced in recent years. However, the mapping accuracy relies heavily on the quality of input layers, the algorithm chosen, and the size and quality of inventory samples for calibration and validation. Results By using airborne lidar data as the “truth” and focusing on the mean canopy height (MCH as a key structural parameter, we test two commonly-used non-parametric techniques of maximum entropy (ME and random forest (RF for developing maps over a study site in Central Gabon. Results of mapping show that both approaches have improved accuracy with more input layers in mapping canopy height at 100 m (1-ha pixels. The bias-corrected spatial models further improve estimates for small and large trees across the tails of height distributions with a trade-off in increasing overall mean squared error that can be readily compensated by increasing the sample size. Conclusions A significant improvement in tropical forest mapping can be achieved by weighting the number of inventory samples against the choice of image layers and the non-parametric algorithms. Without future satellite observations with better sensitivity to forest biomass, the maps based on existing data will remain slightly biased towards the mean of the distribution and under and over estimating the upper and lower tails of the distribution.

  8. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    Science.gov (United States)

    Cannon, Alex

    2017-04-01

    Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a

  9. Considering a non-polynomial basis for local kernel regression problem

    Science.gov (United States)

    Silalahi, Divo Dharma; Midi, Habshah

    2017-01-01

    A common used as solution for local kernel nonparametric regression problem is given using polynomial regression. In this study, we demonstrated the estimator and properties using maximum likelihood estimator for a non-polynomial basis such B-spline to replacing the polynomial basis. This estimator allows for flexibility in the selection of a bandwidth and a knot. The best estimator was selected by finding an optimal bandwidth and knot through minimizing the famous generalized validation function.

  10. Nonparametric Estimation of Cumulative Incidence Functions for Competing Risks Data with Missing Cause of Failure

    DEFF Research Database (Denmark)

    Effraimidis, Georgios; Dahl, Christian Møller

    In this paper, we develop a fully nonparametric approach for the estimation of the cumulative incidence function with Missing At Random right-censored competing risks data. We obtain results on the pointwise asymptotic normality as well as the uniform convergence rate of the proposed nonparametric...

  11. Non-parametric tests of productive efficiency with errors-in-variables

    NARCIS (Netherlands)

    Kuosmanen, T.K.; Post, T.; Scholtes, S.

    2007-01-01

    We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans

  12. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  13. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  14. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

    Science.gov (United States)

    Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

    2017-06-30

    Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Development of a Modified Kernel Regression Model for a Robust Signal Reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Ahmed, Ibrahim; Heo, Gyunyoung [Kyung Hee University, Yongin (Korea, Republic of)

    2016-10-15

    The demand for robust and resilient performance has led to the use of online-monitoring techniques to monitor the process parameters and signal validation. On-line monitoring and signal validation techniques are the two important terminologies in process and equipment monitoring. These techniques are automated methods of monitoring instrument performance while the plant is operating. To implementing these techniques, several empirical models are used. One of these models is nonparametric regression model, otherwise known as kernel regression (KR). Unlike parametric models, KR is an algorithmic estimation procedure which assumes no significant parameters, and it needs no training process after its development when new observations are prepared; which is good for a system characteristic of changing due to ageing phenomenon. Although KR is used and performed excellently when applied to steady state or normal operating data, it has limitation in time-varying data that has several repetition of the same signal, especially if those signals are used to infer the other signals. The convectional KR has limitation in correctly estimating the dependent variable when time-varying data with repeated values are used to estimate the dependent variable especially in signal validation and monitoring. Therefore, we presented here in this work a modified KR that can resolve this issue which can also be feasible in time domain. Data are first transformed prior to the Euclidian distance evaluation considering their slopes/changes with respect to time. The performance of the developed model is evaluated and compared with that of conventional KR using both the lab experimental data and the real time data from CNS provided by KAERI. The result shows that the proposed developed model, having demonstrated high performance accuracy than that of conventional KR, is capable of resolving the identified limitation with convectional KR. We also discovered that there is still need to further

  16. Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci.

    Science.gov (United States)

    Yap, John Stephen; Fan, Jianqing; Wu, Rongling

    2009-12-01

    Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.

  17. Bayesian Nonparametric Estimation of Targeted Agent Effects on Biomarker Change to Predict Clinical Outcome

    Science.gov (United States)

    Graziani, Rebecca; Guindani, Michele; Thall, Peter F.

    2015-01-01

    Summary The effect of a targeted agent on a cancer patient's clinical outcome putatively is mediated through the agent's effect on one or more early biological events. This is motivated by pre-clinical experiments with cells or animals that identify such events, represented by binary or quantitative biomarkers. When evaluating targeted agents in humans, central questions are whether the distribution of a targeted biomarker changes following treatment, the nature and magnitude of this change, and whether it is associated with clinical outcome. Major difficulties in estimating these effects are that a biomarker's distribution may be complex, vary substantially between patients, and have complicated relationships with clinical outcomes. We present a probabilistically coherent framework for modeling and estimation in this setting, including a hierarchical Bayesian nonparametric mixture model for biomarkers that we use to define a functional profile of pre-versus-post treatment biomarker distribution change. The functional is similar to the receiver operating characteristic used in diagnostic testing. The hierarchical model yields clusters of individual patient biomarker profile functionals, and we use the profile as a covariate in a regression model for clinical outcome. The methodology is illustrated by analysis of a dataset from a clinical trial in prostate cancer using imatinib to target platelet-derived growth factor, with the clinical aim to improve progression-free survival time. PMID:25319212

  18. Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies.

    Science.gov (United States)

    Savitsky, Terrance; Vannucci, Marina; Sha, Naijun

    2011-02-01

    This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.

  19. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Directory of Open Access Journals (Sweden)

    Xiaochun Sun

    Full Text Available Genomic selection (GS procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA and reproducing kernel Hilbert spaces (RKHS regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  20. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Science.gov (United States)

    Sun, Xiaochun; Ma, Ping; Mumm, Rita H

    2012-01-01

    Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  1. A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

    Science.gov (United States)

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.

  2. Glaucoma Monitoring in a Clinical Setting Glaucoma Progression Analysis vs Nonparametric Progression Analysis in the Groningen Longitudinal Glaucoma Study

    NARCIS (Netherlands)

    Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.

    Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:

  3. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Science.gov (United States)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  4. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    Science.gov (United States)

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  5. Efficient nonparametric n -body force fields from machine learning

    Science.gov (United States)

    Glielmo, Aldo; Zeni, Claudio; De Vita, Alessandro

    2018-05-01

    We provide a definition and explicit expressions for n -body Gaussian process (GP) kernels, which can learn any interatomic interaction occurring in a physical system, up to n -body contributions, for any value of n . The series is complete, as it can be shown that the "universal approximator" squared exponential kernel can be written as a sum of n -body kernels. These recipes enable the choice of optimally efficient force models for each target system, as confirmed by extensive testing on various materials. We furthermore describe how the n -body kernels can be "mapped" on equivalent representations that provide database-size-independent predictions and are thus crucially more efficient. We explicitly carry out this mapping procedure for the first nontrivial (three-body) kernel of the series, and we show that this reproduces the GP-predicted forces with meV /Å accuracy while being orders of magnitude faster. These results pave the way to using novel force models (here named "M-FFs") that are computationally as fast as their corresponding standard parametrized n -body force fields, while retaining the nonparametric character, the ease of training and validation, and the accuracy of the best recently proposed machine-learning potentials.

  6. Non-parametric Bayesian networks: Improving theory and reviewing applications

    International Nuclear Information System (INIS)

    Hanea, Anca; Morales Napoles, Oswaldo; Ababei, Dan

    2015-01-01

    Applications in various domains often lead to high dimensional dependence modelling. A Bayesian network (BN) is a probabilistic graphical model that provides an elegant way of expressing the joint distribution of a large number of interrelated variables. BNs have been successfully used to represent uncertain knowledge in a variety of fields. The majority of applications use discrete BNs, i.e. BNs whose nodes represent discrete variables. Integrating continuous variables in BNs is an area fraught with difficulty. Several methods that handle discrete-continuous BNs have been proposed in the literature. This paper concentrates only on one method called non-parametric BNs (NPBNs). NPBNs were introduced in 2004 and they have been or are currently being used in at least twelve professional applications. This paper provides a short introduction to NPBNs, a couple of theoretical advances, and an overview of applications. The aim of the paper is twofold: one is to present the latest improvements of the theory underlying NPBNs, and the other is to complement the existing overviews of BNs applications with the NPNBs applications. The latter opens the opportunity to discuss some difficulties that applications pose to the theoretical framework and in this way offers some NPBN modelling guidance to practitioners. - Highlights: • The paper gives an overview of the current NPBNs methodology. • We extend the NPBN methodology by relaxing the conditions of one of its fundamental theorems. • We propose improvements of the data mining algorithm for the NPBNs. • We review the professional applications of the NPBNs.

  7. Nonparametric predictive inference for combined competing risks data

    International Nuclear Information System (INIS)

    Coolen-Maturi, Tahani; Coolen, Frank P.A.

    2014-01-01

    The nonparametric predictive inference (NPI) approach for competing risks data has recently been presented, in particular addressing the question due to which of the competing risks the next unit will fail, and also considering the effects of unobserved, re-defined, unknown or removed competing risks. In this paper, we introduce how the NPI approach can be used to deal with situations where units are not all at risk from all competing risks. This may typically occur if one combines information from multiple samples, which can, e.g. be related to further aspects of units that define the samples or groups to which the units belong or to different applications where the circumstances under which the units operate can vary. We study the effect of combining the additional information from these multiple samples, so effectively borrowing information on specific competing risks from other units, on the inferences. Such combination of information can be relevant to competing risks scenarios in a variety of application areas, including engineering and medical studies

  8. Transition redshift: new constraints from parametric and nonparametric methods

    Energy Technology Data Exchange (ETDEWEB)

    Rani, Nisha; Mahajan, Shobhit; Mukherjee, Amitabha [Department of Physics and Astrophysics, University of Delhi, New Delhi 110007 (India); Jain, Deepak [Deen Dayal Upadhyaya College, University of Delhi, New Delhi 110015 (India); Pires, Nilza, E-mail: nrani@physics.du.ac.in, E-mail: djain@ddu.du.ac.in, E-mail: shobhit.mahajan@gmail.com, E-mail: amimukh@gmail.com, E-mail: npires@dfte.ufrn.br [Departamento de Física Teórica e Experimental, UFRN, Campus Universitário, Natal, RN 59072-970 (Brazil)

    2015-12-01

    In this paper, we use the cosmokinematics approach to study the accelerated expansion of the Universe. This is a model independent approach and depends only on the assumption that the Universe is homogeneous and isotropic and is described by the FRW metric. We parametrize the deceleration parameter, q(z), to constrain the transition redshift (z{sub t}) at which the expansion of the Universe goes from a decelerating to an accelerating phase. We use three different parametrizations of q(z) namely, q{sub I}(z)=q{sub 1}+q{sub 2}z, q{sub II} (z) = q{sub 3} + q{sub 4} ln (1 + z) and q{sub III} (z)=½+q{sub 5}/(1+z){sup 2}. A joint analysis of the age of galaxies, strong lensing and supernovae Ia data indicates that the transition redshift is less than unity i.e. z{sub t} < 1. We also use a nonparametric approach (LOESS+SIMEX) to constrain z{sub t}. This too gives z{sub t} < 1 which is consistent with the value obtained by the parametric approach.

  9. Discrete non-parametric kernel estimation for global sensitivity analysis

    International Nuclear Information System (INIS)

    Senga Kiessé, Tristan; Ventura, Anne

    2016-01-01

    This work investigates the discrete kernel approach for evaluating the contribution of the variance of discrete input variables to the variance of model output, via analysis of variance (ANOVA) decomposition. Until recently only the continuous kernel approach has been applied as a metamodeling approach within sensitivity analysis framework, for both discrete and continuous input variables. Now the discrete kernel estimation is known to be suitable for smoothing discrete functions. We present a discrete non-parametric kernel estimator of ANOVA decomposition of a given model. An estimator of sensitivity indices is also presented with its asymtotic convergence rate. Some simulations on a test function analysis and a real case study from agricultural have shown that the discrete kernel approach outperforms the continuous kernel one for evaluating the contribution of moderate or most influential discrete parameters to the model output. - Highlights: • We study a discrete kernel estimation for sensitivity analysis of a model. • A discrete kernel estimator of ANOVA decomposition of the model is presented. • Sensitivity indices are calculated for discrete input parameters. • An estimator of sensitivity indices is also presented with its convergence rate. • An application is realized for improving the reliability of environmental models.

  10. Nonparametric predictive inference for combining diagnostic tests with parametric copula

    Science.gov (United States)

    Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

    2017-09-01

    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.

  11. Nonparametric Integrated Agrometeorological Drought Monitoring: Model Development and Application

    Science.gov (United States)

    Zhang, Qiang; Li, Qin; Singh, Vijay P.; Shi, Peijun; Huang, Qingzhong; Sun, Peng

    2018-01-01

    Drought is a major natural hazard that has massive impacts on the society. How to monitor drought is critical for its mitigation and early warning. This study proposed a modified version of the multivariate standardized drought index (MSDI) based on precipitation, evapotranspiration, and soil moisture, i.e., modified multivariate standardized drought index (MMSDI). This study also used nonparametric joint probability distribution analysis. Comparisons were done between standardized precipitation evapotranspiration index (SPEI), standardized soil moisture index (SSMI), MSDI, and MMSDI, and real-world observed drought regimes. Results indicated that MMSDI detected droughts that SPEI and/or SSMI failed to do. Also, MMSDI detected almost all droughts that were identified by SPEI and SSMI. Further, droughts detected by MMSDI were similar to real-world observed droughts in terms of drought intensity and drought-affected area. When compared to MMSDI, MSDI has the potential to overestimate drought intensity and drought-affected area across China, which should be attributed to exclusion of the evapotranspiration components from estimation of drought intensity. Therefore, MMSDI is proposed for drought monitoring that can detect agrometeorological droughts. Results of this study provide a framework for integrated drought monitoring in other regions of the world and can help to develop drought mitigation.

  12. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.

    Science.gov (United States)

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A

    2018-01-30

    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Nonparametric adaptive age replacement with a one-cycle criterion

    International Nuclear Information System (INIS)

    Coolen-Schrijner, P.; Coolen, F.P.A.

    2007-01-01

    Age replacement of technical units has received much attention in the reliability literature over the last four decades. Mostly, the failure time distribution for the units is assumed to be known, and minimal costs per unit of time is used as optimality criterion, where renewal reward theory simplifies the mathematics involved but requires the assumption that the same process and replacement strategy continues over a very large ('infinite') period of time. Recently, there has been increasing attention to adaptive strategies for age replacement, taking into account the information from the process. Although renewal reward theory can still be used to provide an intuitively and mathematically attractive optimality criterion, it is more logical to use minimal costs per unit of time over a single cycle as optimality criterion for adaptive age replacement. In this paper, we first show that in the classical age replacement setting, with known failure time distribution with increasing hazard rate, the one-cycle criterion leads to earlier replacement than the renewal reward criterion. Thereafter, we present adaptive age replacement with a one-cycle criterion within the nonparametric predictive inferential framework. We study the performance of this approach via simulations, which are also used for comparisons with the use of the renewal reward criterion within the same statistical framework

  14. Nonparametric Bayes Classification and Hypothesis Testing on Manifolds

    Science.gov (United States)

    Bhattacharya, Abhishek; Dunson, David

    2012-01-01

    Our first focus is prediction of a categorical response variable using features that lie on a general manifold. For example, the manifold may correspond to the surface of a hypersphere. We propose a general kernel mixture model for the joint distribution of the response and predictors, with the kernel expressed in product form and dependence induced through the unknown mixing measure. We provide simple sufficient conditions for large support and weak and strong posterior consistency in estimating both the joint distribution of the response and predictors and the conditional distribution of the response. Focusing on a Dirichlet process prior for the mixing measure, these conditions hold using von Mises-Fisher kernels when the manifold is the unit hypersphere. In this case, Bayesian methods are developed for efficient posterior computation using slice sampling. Next we develop Bayesian nonparametric methods for testing whether there is a difference in distributions between groups of observations on the manifold having unknown densities. We prove consistency of the Bayes factor and develop efficient computational methods for its calculation. The proposed classification and testing methods are evaluated using simulation examples and applied to spherical data applications. PMID:22754028

  15. Bayesian nonparametric meta-analysis using Polya tree mixture models.

    Science.gov (United States)

    Branscum, Adam J; Hanson, Timothy E

    2008-09-01

    Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.

  16. Performances of non-parametric statistics in sensitivity analysis and parameter ranking

    International Nuclear Information System (INIS)

    Saltelli, A.

    1987-01-01

    Twelve parametric and non-parametric sensitivity analysis techniques are compared in the case of non-linear model responses. The test models used are taken from the long-term risk analysis for the disposal of high level radioactive waste in a geological formation. They describe the transport of radionuclides through a set of engineered and natural barriers from the repository to the biosphere and to man. The output data from these models are the dose rates affecting the maximum exposed individual of a critical group at a given point in time. All the techniques are applied to the output from the same Monte Carlo simulations, where a modified version of Latin Hypercube method is used for the sample selection. Hypothesis testing is systematically applied to quantify the degree of confidence in the results given by the various sensitivity estimators. The estimators are ranked according to their robustness and stability, on the basis of two test cases. The conclusions are that no estimator can be considered the best from all points of view and recommend the use of more than just one estimator in sensitivity analysis

  17. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  18. Variable selection and model choice in geoadditive regression models.

    Science.gov (United States)

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  19. Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

    Science.gov (United States)

    Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

    2017-03-01

    Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.

  20. Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

    OpenAIRE

    KELEŞ, Taliha; ALTUN, Murat

    2016-01-01

    Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...

  1. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  2. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  3. A robust nonparametric method for quantifying undetected extinctions.

    Science.gov (United States)

    Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E

    2016-06-01

    How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.

  4. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  5. Multicollinearity and Regression Analysis

    Science.gov (United States)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  6. Kernel bandwidth estimation for non-parametric density estimation: a comparative study

    CSIR Research Space (South Africa)

    Van der Walt, CM

    2013-12-01

    Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...

  7. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  8. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....

  9. Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression

    International Nuclear Information System (INIS)

    Chilenski, M.A.; Greenwald, M.; Howard, N.T.; White, A.E.; Rice, J.E.; Walk, J.R.; Marzouk, Y.

    2015-01-01

    The need to fit smooth temperature and density profiles to discrete observations is ubiquitous in plasma physics, but the prevailing techniques for this have many shortcomings that cast doubt on the statistical validity of the results. This issue is amplified in the context of validation of gyrokinetic transport models (Holland et al 2009 Phys. Plasmas 16 052301), where the strong sensitivity of the code outputs to input gradients means that inadequacies in the profile fitting technique can easily lead to an incorrect assessment of the degree of agreement with experimental measurements. In order to rectify the shortcomings of standard approaches to profile fitting, we have applied Gaussian process regression (GPR), a powerful non-parametric regression technique, to analyse an Alcator C-Mod L-mode discharge used for past gyrokinetic validation work (Howard et al 2012 Nucl. Fusion 52 063002). We show that the GPR techniques can reproduce the previous results while delivering more statistically rigorous fits and uncertainty estimates for both the value and the gradient of plasma profiles with an improved level of automation. We also discuss how the use of GPR can allow for dramatic increases in the rate of convergence of uncertainty propagation for any code that takes experimental profiles as inputs. The new GPR techniques for profile fitting and uncertainty propagation are quite useful and general, and we describe the steps to implementation in detail in this paper. These techniques have the potential to substantially improve the quality of uncertainty estimates on profile fits and the rate of convergence of uncertainty propagation, making them of great interest for wider use in fusion experiments and modelling efforts. (paper)

  10. Examples of the Application of Nonparametric Information Geometry to Statistical Physics

    Directory of Open Access Journals (Sweden)

    Giovanni Pistone

    2013-09-01

    Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.

  11. Nonparametric Identification and Estimation of Finite Mixture Models of Dynamic Discrete Choices

    OpenAIRE

    Hiroyuki Kasahara; Katsumi Shimotsu

    2006-01-01

    In dynamic discrete choice analysis, controlling for unobserved heterogeneity is an important issue, and finite mixture models provide flexible ways to account for unobserved heterogeneity. This paper studies nonparametric identifiability of type probabilities and type-specific component distributions in finite mixture models of dynamic discrete choices. We derive sufficient conditions for nonparametric identification for various finite mixture models of dynamic discrete choices used in appli...

  12. Design Automation Using Script Languages. High-Level CAD Templates in Non-Parametric Programs

    Science.gov (United States)

    Moreno, R.; Bazán, A. M.

    2017-10-01

    The main purpose of this work is to study the advantages offered by the application of traditional techniques of technical drawing in processes for automation of the design, with non-parametric CAD programs, provided with scripting languages. Given that an example drawing can be solved with traditional step-by-step detailed procedures, is possible to do the same with CAD applications and to generalize it later, incorporating references. In today’s modern CAD applications, there are striking absences of solutions for building engineering: oblique projections (military and cavalier), 3D modelling of complex stairs, roofs, furniture, and so on. The use of geometric references (using variables in script languages) and their incorporation into high-level CAD templates allows the automation of processes. Instead of repeatedly creating similar designs or modifying their data, users should be able to use these templates to generate future variations of the same design. This paper presents the automation process of several complex drawing examples based on CAD script files aided with parametric geometry calculation tools. The proposed method allows us to solve complex geometry designs not currently incorporated in the current CAD applications and to subsequently create other new derivatives without user intervention. Automation in the generation of complex designs not only saves time but also increases the quality of the presentations and reduces the possibility of human errors.

  13. PERFORMANCE PERSISTENCE OF TURKISH A AND B TYPE MUTUAL FUNDS: THE PARAMETRIC AND NONPARAMETRIC APPROACH

    Directory of Open Access Journals (Sweden)

    VELİ AKEL

    2013-06-01

    Full Text Available In this study, single index models are applied to a free survivorship bias database of 51 A and 51 B Types Turkish mutual funds using monthly returns over 5 years from 2000 to 2004. Then, it has been investigated whether mutual fund managers have market timing ability. Turkish Institutional Investment Managers’ Association A and B Type Fund Indexes are firstly used as benchmark portfolios. The challenging question is whether Turkish mutual funds have performance persistency over the short and long term or not. This study uses both parametric and non-parametric techniques to examine performance persistence. The overall conclusion is that Type A mutual funds managers do not have stock selection and market timing ability. However, Type B mutual funds managers do have stock selection ability. Type A mutual funds show evidence of relative and absolute persistence in the short term while Type B mutual funds show significant results of relative and absolute persistence in both of the terms. Although there are various results in performance persistence of mutual funds, the repeat winner phenomenon is stronger over shorter periods of evaluation. Consequently, it seems that Turkish mutual funds have performance persistency at least for the short term.

  14. Semi-nonparametric VaR forecasts for hedge funds during the recent crisis

    Science.gov (United States)

    Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier

    2014-05-01

    The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.

  15. USING A DEA MANAGEMENT TOOLTHROUGH A NONPARAMETRIC APPROACH: AN EXAMINATION OF URBAN-RURAL EFFECTS ON THAI SCHOOL EFFICIENCY

    Directory of Open Access Journals (Sweden)

    SANGCHAN KANTABUTRA

    2009-04-01

    Full Text Available This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA. Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have access to and practice different production technologies than rural schools, and rural institutions appear to operate less efficiently than their urban counterparts. In addition, a sensitivity analysis, conducted to ascertain the robustness of the analytical framework, revealed the stability of urban-rural effects on school efficiency. Policy to improve school eff iciency should thus take varying geographical area differences into account, viewing rural and urban schools as different from one another. Moreover, policymakers might consider shifting existing resources from urban schools to rural schools, provided that the increase in overall rural efficiency would be greater than the decrease, if any, in the city. Future research directions are discussed.

  16. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    Science.gov (United States)

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  17. An Evaluation of Parametric and Nonparametric Models of Fish Population Response.

    Energy Technology Data Exchange (ETDEWEB)

    Haas, Timothy C.; Peterson, James T.; Lee, Danny C.

    1999-11-01

    Predicting the distribution or status of animal populations at large scales often requires the use of broad-scale information describing landforms, climate, vegetation, etc. These data, however, often consist of mixtures of continuous and categorical covariates and nonmultiplicative interactions among covariates, complicating statistical analyses. Using data from the interior Columbia River Basin, USA, we compared four methods for predicting the distribution of seven salmonid taxa using landscape information. Subwatersheds (mean size, 7800 ha) were characterized using a set of 12 covariates describing physiography, vegetation, and current land-use. The techniques included generalized logit modeling, classification trees, a nearest neighbor technique, and a modular neural network. We evaluated model performance using out-of-sample prediction accuracy via leave-one-out cross-validation and introduce a computer-intensive Monte Carlo hypothesis testing approach for examining the statistical significance of landscape covariates with the non-parametric methods. We found the modular neural network and the nearest-neighbor techniques to be the most accurate, but were difficult to summarize in ways that provided ecological insight. The modular neural network also required the most extensive computer resources for model fitting and hypothesis testing. The generalized logit models were readily interpretable, but were the least accurate, possibly due to nonlinear relationships and nonmultiplicative interactions among covariates. Substantial overlap among the statistically significant (P<0.05) covariates for each method suggested that each is capable of detecting similar relationships between responses and covariates. Consequently, we believe that employing one or more methods may provide greater biological insight without sacrificing prediction accuracy.

  18. Nonparametric Change Point Diagnosis Method of Concrete Dam Crack Behavior Abnormality

    Directory of Open Access Journals (Sweden)

    Zhanchao Li

    2013-01-01

    Full Text Available The study on diagnosis method of concrete crack behavior abnormality has always been a hot spot and difficulty in the safety monitoring field of hydraulic structure. Based on the performance of concrete dam crack behavior abnormality in parametric statistical model and nonparametric statistical model, the internal relation between concrete dam crack behavior abnormality and statistical change point theory is deeply analyzed from the model structure instability of parametric statistical model and change of sequence distribution law of nonparametric statistical model. On this basis, through the reduction of change point problem, the establishment of basic nonparametric change point model, and asymptotic analysis on test method of basic change point problem, the nonparametric change point diagnosis method of concrete dam crack behavior abnormality is created in consideration of the situation that in practice concrete dam crack behavior may have more abnormality points. And the nonparametric change point diagnosis method of concrete dam crack behavior abnormality is used in the actual project, demonstrating the effectiveness and scientific reasonableness of the method established. Meanwhile, the nonparametric change point diagnosis method of concrete dam crack behavior abnormality has a complete theoretical basis and strong practicality with a broad application prospect in actual project.

  19. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  20. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  1. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  2. Nonlinear Regression with R

    CERN Document Server

    Ritz, Christian; Parmigiani, Giovanni

    2009-01-01

    R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.

  3. Bayesian ARTMAP for regression.

    Science.gov (United States)

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.

  4. Bounded Gaussian process regression

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan

    2013-01-01

    We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....

  5. and Multinomial Logistic Regression

    African Journals Online (AJOL)

    This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).

  6. Mechanisms of neuroblastoma regression

    Science.gov (United States)

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  7. Parametric and Non-Parametric Vibration-Based Structural Identification Under Earthquake Excitation

    Science.gov (United States)

    Pentaris, Fragkiskos P.; Fouskitakis, George N.

    2014-05-01

    The problem of modal identification in civil structures is of crucial importance, and thus has been receiving increasing attention in recent years. Vibration-based methods are quite promising as they are capable of identifying the structure's global characteristics, they are relatively easy to implement and they tend to be time effective and less expensive than most alternatives [1]. This paper focuses on the off-line structural/modal identification of civil (concrete) structures subjected to low-level earthquake excitations, under which, they remain within their linear operating regime. Earthquakes and their details are recorded and provided by the seismological network of Crete [2], which 'monitors' the broad region of south Hellenic arc, an active seismic region which functions as a natural laboratory for earthquake engineering of this kind. A sufficient number of seismic events are analyzed in order to reveal the modal characteristics of the structures under study, that consist of the two concrete buildings of the School of Applied Sciences, Technological Education Institute of Crete, located in Chania, Crete, Hellas. Both buildings are equipped with high-sensitivity and accuracy seismographs - providing acceleration measurements - established at the basement (structure's foundation) presently considered as the ground's acceleration (excitation) and at all levels (ground floor, 1st floor, 2nd floor and terrace). Further details regarding the instrumentation setup and data acquisition may be found in [3]. The present study invokes stochastic, both non-parametric (frequency-based) and parametric methods for structural/modal identification (natural frequencies and/or damping ratios). Non-parametric methods include Welch-based spectrum and Frequency response Function (FrF) estimation, while parametric methods, include AutoRegressive (AR), AutoRegressive with eXogeneous input (ARX) and Autoregressive Moving-Average with eXogeneous input (ARMAX) models[4, 5

  8. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    Science.gov (United States)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  9. Potential applications of rapid/elementary nonparametric statistical techniques (NST) to electrochemical problems

    International Nuclear Information System (INIS)

    Fahidy, Thomas Z.

    2009-01-01

    A major advantage of NST lies in the unimportance of the probability distribution of observations. In this paper, the sign test, the rank-sum test, the Kruskal-Wallis test, the Friedman test, and the runs test illustrate the potential of certain rapid NST for the evaluation of electrochemical process performance.

  10. Classification of thermal-hydraulics using scenarios of non-parametric techniques

    International Nuclear Information System (INIS)

    Villamizar, M.; Martorell, S.; Sanchez-Saez, F.; Villanueva, J. F.; Carlos, S.; Sanchez, A.

    2013-01-01

    The objective of the study is to apply a probabilistic neural network (PNN) that allow to classify sheath temperature trajectory from the beginning of the accident to the stabilization of the plant from a certain inputs and a starting from these groups establish models that predict the clad temperature peaking.

  11. Effects of dating errors on nonparametric trend analyses of speleothem time series

    Directory of Open Access Journals (Sweden)

    M. Mudelsee

    2012-10-01

    Full Text Available A fundamental problem in paleoclimatology is to take fully into account the various error sources when examining proxy records with quantitative methods of statistical time series analysis. Records from dated climate archives such as speleothems add extra uncertainty from the age determination to the other sources that consist in measurement and proxy errors. This paper examines three stalagmite time series of oxygen isotopic composition (δ18O from two caves in western Germany, the series AH-1 from the Atta Cave and the series Bu1 and Bu4 from the Bunker Cave. These records carry regional information about past changes in winter precipitation and temperature. U/Th and radiocarbon dating reveals that they cover the later part of the Holocene, the past 8.6 thousand years (ka. We analyse centennial- to millennial-scale climate trends by means of nonparametric Gasser–Müller kernel regression. Error bands around fitted trend curves are determined by combining (1 block bootstrap resampling to preserve noise properties (shape, autocorrelation of the δ18O residuals and (2 timescale simulations (models StalAge and iscam. The timescale error influences on centennial- to millennial-scale trend estimation are not excessively large. We find a "mid-Holocene climate double-swing", from warm to cold to warm winter conditions (6.5 ka to 6.0 ka to 5.1 ka, with warm–cold amplitudes of around 0.5‰ δ18O; this finding is documented by all three records with high confidence. We also quantify the Medieval Warm Period (MWP, the Little Ice Age (LIA and the current warmth. Our analyses cannot unequivocally support the conclusion that current regional winter climate is warmer than that during the MWP.

  12. Better Autologistic Regression

    Directory of Open Access Journals (Sweden)

    Mark A. Wolters

    2017-11-01

    Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

  13. Regression in organizational leadership.

    Science.gov (United States)

    Kernberg, O F

    1979-02-01

    The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.

  14. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  15. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  16. A Non-Parametric Surrogate-based Test of Significance for T-Wave Alternans Detection

    Science.gov (United States)

    Nemati, Shamim; Abdala, Omar; Bazán, Violeta; Yim-Yeh, Susie; Malhotra, Atul; Clifford, Gari

    2010-01-01

    We present a non-parametric adaptive surrogate test that allows for the differentiation of statistically significant T-Wave Alternans (TWA) from alternating patterns that can be solely explained by the statistics of noise. The proposed test is based on estimating the distribution of noise induced alternating patterns in a beat sequence from a set of surrogate data derived from repeated reshuffling of the original beat sequence. Thus, in assessing the significance of the observed alternating patterns in the data no assumptions are made about the underlying noise distribution. In addition, since the distribution of noise-induced alternans magnitudes is calculated separately for each sequence of beats within the analysis window, the method is robust to data non-stationarities in both noise and TWA. The proposed surrogate method for rejecting noise was compared to the standard noise rejection methods used with the Spectral Method (SM) and the Modified Moving Average (MMA) techniques. Using a previously described realistic multi-lead model of TWA, and real physiological noise, we demonstrate the proposed approach reduces false TWA detections, while maintaining a lower missed TWA detection compared with all the other methods tested. A simple averaging-based TWA estimation algorithm was coupled with the surrogate significance testing and was evaluated on three public databases; the Normal Sinus Rhythm Database (NRSDB), the Chronic Heart Failure Database (CHFDB) and the Sudden Cardiac Death Database (SCDDB). Differences in TWA amplitudes between each database were evaluated at matched heart rate (HR) intervals from 40 to 120 beats per minute (BPM). Using the two-sample Kolmogorov-Smirnov test, we found that significant differences in TWA levels exist between each patient group at all decades of heart rates. The most marked difference was generally found at higher heart rates, and the new technique resulted in a larger margin of separability between patient populations than

  17. CATDAT : A Program for Parametric and Nonparametric Categorical Data Analysis : User's Manual Version 1.0, 1998-1999 Progress Report.

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, James T.

    1999-12-01

    Natural resource professionals are increasingly required to develop rigorous statistical models that relate environmental data to categorical responses data. Recent advances in the statistical and computing sciences have led to the development of sophisticated methods for parametric and nonparametric analysis of data with categorical responses. The statistical software package CATDAT was designed to make some of these relatively new and powerful techniques available to scientists. The CATDAT statistical package includes 4 analytical techniques: generalized logit modeling; binary classification tree; extended K-nearest neighbor classification; and modular neural network.

  18. Multilevel Latent Class Analysis: Parametric and Nonparametric Models

    Science.gov (United States)

    Finch, W. Holmes; French, Brian F.

    2014-01-01

    Latent class analysis is an analytic technique often used in educational and psychological research to identify meaningful groups of individuals within a larger heterogeneous population based on a set of variables. This technique is flexible, encompassing not only a static set of variables but also longitudinal data in the form of growth mixture…

  19. Techniques for detecting effects of urban and rural land-use practices on stream-water chemistry in selected watersheds in Texas, Minnesota,and Illinois

    Science.gov (United States)

    Walker, J.F.

    1993-01-01

    Although considerable effort has been expended during the past two decades to control nonpoint-source contamination of streams and lakes in urban and rural watersheds, little has been published on the effectiveness of various management practices at the watershed scale. This report presents a discussion of several parametric and nonparametric statistical techniques for detecting changes in water-chemistry data. The need for reducing the influence of natural variability was recognized and accomplished through the use of regression equations. Traditional analyses have focused on fixed-frequency instantaneous concentration data; this report describes the use of storm load data as an alternative.

  20. Robust extraction of baseline signal of atmospheric trace species using local regression

    Science.gov (United States)

    Ruckstuhl, A. F.; Henne, S.; Reimann, S.; Steinbacher, M.; Vollmer, M. K.; O'Doherty, S.; Buchmann, B.; Hueglin, C.

    2012-11-01

    The identification of atmospheric trace species measurements that are representative of well-mixed background air masses is required for monitoring atmospheric composition change at background sites. We present a statistical method based on robust local regression that is well suited for the selection of background measurements and the estimation of associated baseline curves. The bootstrap technique is applied to calculate the uncertainty in the resulting baseline curve. The non-parametric nature of the proposed approach makes it a very flexible data filtering method. Application to carbon monoxide (CO) measured from 1996 to 2009 at the high-alpine site Jungfraujoch (Switzerland, 3580 m a.s.l.), and to measurements of 1,1-difluoroethane (HFC-152a) from Jungfraujoch (2000 to 2009) and Mace Head (Ireland, 1995 to 2009) demonstrates the feasibility and usefulness of the proposed approach. The determined average annual change of CO at Jungfraujoch for the 1996 to 2009 period as estimated from filtered annual mean CO concentrations is -2.2 ± 1.1 ppb yr-1. For comparison, the linear trend of unfiltered CO measurements at Jungfraujoch for this time period is -2.9 ± 1.3 ppb yr-1.

  1. Robust extraction of baseline signal of atmospheric trace species using local regression

    Directory of Open Access Journals (Sweden)

    A. F. Ruckstuhl

    2012-11-01

    Full Text Available The identification of atmospheric trace species measurements that are representative of well-mixed background air masses is required for monitoring atmospheric composition change at background sites. We present a statistical method based on robust local regression that is well suited for the selection of background measurements and the estimation of associated baseline curves. The bootstrap technique is applied to calculate the uncertainty in the resulting baseline curve. The non-parametric nature of the proposed approach makes it a very flexible data filtering method. Application to carbon monoxide (CO measured from 1996 to 2009 at the high-alpine site Jungfraujoch (Switzerland, 3580 m a.s.l., and to measurements of 1,1-difluoroethane (HFC-152a from Jungfraujoch (2000 to 2009 and Mace Head (Ireland, 1995 to 2009 demonstrates the feasibility and usefulness of the proposed approach.

    The determined average annual change of CO at Jungfraujoch for the 1996 to 2009 period as estimated from filtered annual mean CO concentrations is −2.2 ± 1.1 ppb yr−1. For comparison, the linear trend of unfiltered CO measurements at Jungfraujoch for this time period is −2.9 ± 1.3 ppb yr−1.

  2. Domestic Multinationals and Foreign-Owned Firms in Italy: Evidence from Quantile Regression

    Directory of Open Access Journals (Sweden)

    Grasseni, Mara

    2010-06-01

    Full Text Available This paper investigates the performance differences across and within foreign-owned firms and domestic multinationals in Italy. Used for the empirical analysis are non-parametric tests based on the concept of first order stochastic dominance and quantile regression technique. The firm-level analysis distinguishes between foreign-owned firms of different nationalities and domestic MNEs according to the location of their FDI, and it focuses not only on productivity but also on differences in average wages, capital intensity, and financial and non-financial indicators, namely ROS, ROI and debt leverage. Overall, the results provide evidence of remarkable heterogeneity across and within multinationals. In particular, it seems not possible to identify a clear foreign advantage at least in terms of productivity, because foreign-owned firms do not outperform domestic multinationals. Interesting results are obtained when focusing on ROS and ROI, where the profitability gaps change as one moves from the bottom to the top of the conditional distribution. Domestic multinationals investing only in developed countries present higher ROS and ROI compared with the subgroups of foreign-owned firms, but only at the lower quantiles, while at the upper quantiles the advantage seems to favour foreign firms. Finally, in regard to domestic multinationals, there is strong evidence that those active only in less developed countries persistently exhibit the worst performances

  3. Regression analysis of the structure function for reliability evaluation of continuous-state system

    International Nuclear Information System (INIS)

    Gamiz, M.L.; Martinez Miranda, M.D.

    2010-01-01

    Technical systems are designed to perform an intended task with an admissible range of efficiency. According to this idea, it is permissible that the system runs among different levels of performance, in addition to complete failure and the perfect functioning one. As a consequence, reliability theory has evolved from binary-state systems to the most general case of continuous-state system, in which the state of the system changes over time through some interval on the real number line. In this context, obtaining an expression for the structure function becomes difficult, compared to the discrete case, with difficulty increasing as the number of components of the system increases. In this work, we propose a method to build a structure function for a continuum system by using multivariate nonparametric regression techniques, in which certain analytical restrictions on the variable of interest must be taken into account. Once the structure function is obtained, some reliability indices of the system are estimated. We illustrate our method via several numerical examples.

  4. Regression analysis of the structure function for reliability evaluation of continuous-state system

    Energy Technology Data Exchange (ETDEWEB)

    Gamiz, M.L., E-mail: mgamiz@ugr.e [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain); Martinez Miranda, M.D. [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain)

    2010-02-15

    Technical systems are designed to perform an intended task with an admissible range of efficiency. According to this idea, it is permissible that the system runs among different levels of performance, in addition to complete failure and the perfect functioning one. As a consequence, reliability theory has evolved from binary-state systems to the most general case of continuous-state system, in which the state of the system changes over time through some interval on the real number line. In this context, obtaining an expression for the structure function becomes difficult, compared to the discrete case, with difficulty increasing as the number of components of the system increases. In this work, we propose a method to build a structure function for a continuum system by using multivariate nonparametric regression techniques, in which certain analytical restrictions on the variable of interest must be taken into account. Once the structure function is obtained, some reliability indices of the system are estimated. We illustrate our method via several numerical examples.

  5. Stochastic semi-nonparametric frontier estimation of electricity distribution networks: Application of the StoNED method in the Finnish regulatory model

    International Nuclear Information System (INIS)

    Kuosmanen, Timo

    2012-01-01

    Electricity distribution network is a prime example of a natural local monopoly. In many countries, electricity distribution is regulated by the government. Many regulators apply frontier estimation techniques such as data envelopment analysis (DEA) or stochastic frontier analysis (SFA) as an integral part of their regulatory framework. While more advanced methods that combine nonparametric frontier with stochastic error term are known in the literature, in practice, regulators continue to apply simplistic methods. This paper reports the main results of the project commissioned by the Finnish regulator for further development of the cost frontier estimation in their regulatory framework. The key objectives of the project were to integrate a stochastic SFA-style noise term to the nonparametric, axiomatic DEA-style cost frontier, and to take the heterogeneity of firms and their operating environments better into account. To achieve these objectives, a new method called stochastic nonparametric envelopment of data (StoNED) was examined. Based on the insights and experiences gained in the empirical analysis using the real data of the regulated networks, the Finnish regulator adopted the StoNED method in use from 2012 onwards.

  6. Steganalysis using logistic regression

    Science.gov (United States)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  7. SEPARATION PHENOMENA LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Ikaro Daniel de Carvalho Barreto

    2014-03-01

    Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.

  8. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...

  9. Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

    Energy Technology Data Exchange (ETDEWEB)

    Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu

    2008-03-07

    We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.

  10. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  11. Estimation of the lifetime distribution of mechatronic systems in the presence of a covariate: A comparison among parametric, semiparametric and nonparametric models

    International Nuclear Information System (INIS)

    Bobrowski, Sebastian; Chen, Hong; Döring, Maik; Jensen, Uwe; Schinköthe, Wolfgang

    2015-01-01

    In practice manufacturers may have lots of failure data of similar products using the same technology basis under different operating conditions. Thus, one can try to derive predictions for the distribution of the lifetime of newly developed components or new application environments through the existing data using regression models based on covariates. Three categories of such regression models are considered: a parametric, a semiparametric and a nonparametric approach. First, we assume that the lifetime is Weibull distributed, where its parameters are modelled as linear functions of the covariate. Second, the Cox proportional hazards model, well-known in Survival Analysis, is applied. Finally, a kernel estimator is used to interpolate between empirical distribution functions. In particular the last case is new in the context of reliability analysis. We propose a goodness of fit measure (GoF), which can be applied to all three types of regression models. Using this GoF measure we discuss a new model selection procedure. To illustrate this method of reliability prediction, the three classes of regression models are applied to real test data of motor experiments. Further the performance of the approaches is investigated by Monte Carlo simulations. - Highlights: • We estimate the lifetime distribution in the presence of a covariate. • Three types of regression models are considered and compared. • A new nonparametric estimator based on our particular data structure is introduced. • We propose a goodness of fit measure and show a new model selection procedure. • A case study with real data and Monte Carlo simulations are performed

  12. Post-processing through linear regression

    Science.gov (United States)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  13. Post-processing through linear regression

    Directory of Open Access Journals (Sweden)

    B. Van Schaeybroeck

    2011-03-01

    Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

    These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  14. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  15. A nonparametric multiple imputation approach for missing categorical data

    Directory of Open Access Journals (Sweden)

    Muhan Zhou

    2017-06-01

    Full Text Available Abstract Background Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness probabilities. Methods We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model and the other fits a logistic regression for predicting missingness probabilities (the missingness model. A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with

  16. Aid and growth regressions

    DEFF Research Database (Denmark)

    Hansen, Henrik; Tarp, Finn

    2001-01-01

    This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy....... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes.......This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy...

  17. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes.

    Science.gov (United States)

    Bhattacharya, Abhishek; Dunson, David B

    2010-12-01

    Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback-Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches.

  18. Nonparametric model validations for hidden Markov models with applications in financial econometrics.

    Science.gov (United States)

    Zhao, Zhibiao

    2011-06-01

    We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.

  19. NONPARAMETRIC FIXED EFFECT PANEL DATA MODELS: RELATIONSHIP BETWEEN AIR POLLUTION AND INCOME FOR TURKEY

    Directory of Open Access Journals (Sweden)

    Rabia Ece OMAY

    2013-06-01

    Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.

  20. Nonparametric method for failures diagnosis in the actuating subsystem of aircraft control system

    Science.gov (United States)

    Terentev, M. N.; Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.

    2018-02-01

    In this paper we design a nonparametric method for failures diagnosis in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on analytical nonparametric one-step-ahead state prediction approach. This makes it possible to predict the behavior of unidentified and failure dynamic systems, to weaken the requirements to control signals, and to reduce the diagnostic time and problem complexity.

  1. Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

    Science.gov (United States)

    Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.

    2018-03-01

    We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.

  2. Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

    Directory of Open Access Journals (Sweden)

    Jinchao Feng

    2018-03-01

    Full Text Available We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data. The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.

  3. A Bayesian approach to the analysis of quantal bioassay studies using nonparametric mixture models.

    Science.gov (United States)

    Fronczyk, Kassandra; Kottas, Athanasios

    2014-03-01

    We develop a Bayesian nonparametric mixture modeling framework for quantal bioassay settings. The approach is built upon modeling dose-dependent response distributions. We adopt a structured nonparametric prior mixture model, which induces a monotonicity restriction for the dose-response curve. Particular emphasis is placed on the key risk assessment goal of calibration for the dose level that corresponds to a specified response. The proposed methodology yields flexible inference for the dose-response relationship as well as for other inferential objectives, as illustrated with two data sets from the literature. © 2013, The International Biometric Society.

  4. Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja

    CERN Document Server

    Taskinen, Sara

    2015-01-01

    Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.

  5. Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.

    Science.gov (United States)

    Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S

    2016-05-01

    A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.

  6. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  7. Canonical variate regression.

    Science.gov (United States)

    Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun

    2016-07-01

    In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Nonparametric Analysis of Right Censored Data with Multiple Comparisons

    OpenAIRE

    Shih, Hwei-Weng

    1982-01-01

    This report demonstrates the use of a computer program written in FORTRAN for the Burroughs B6800 computer at Utah State University to perform Breslow's (1970) generalization of the Kruskal-Wallis test for right censored data. A pairwise multiple comparison procedure using Bonferroni's inequality is also introduced and demonstrated. Comparisons are also made with a parametric F test and the original Kruskal-Wallis test. Application of these techniques to two data sets indicate that there is l...

  9. Monitoring coastal marshes biomass with CASI: a comparison of parametric and non-parametric models

    Science.gov (United States)

    Mo, Y.; Kearney, M.

    2017-12-01

    Coastal marshes are important carbon sinks that face multiple natural and anthropogenic stresses. Optical remote sensing is a powerful tool for closely monitoring the biomass of coastal marshes. However, application of hyperspectral sensors on assessing the biomass of diverse coastal marsh ecosystems is limited. This study samples spectral and biophysical data from coastal freshwater, intermediate, brackish, and saline marshes in Louisiana, and develops parametric and non-parametric models for using the Compact Airborne Spectrographic Imager (CASI) to retrieve the marshes' biomass. Linear models and random forest models are developed from simulated CASI data (48 bands, 380-1050 nm, bandwidth 14 nm). Linear models are also developed using narrowband vegetation indices computed from all possible band combinations from the blue, red, and near infrared wavelengths. It is found that the linear models derived from the optimal narrowband vegetation indices provide strong predictions for the marshes' Leaf Area Index (LAI; R2 > 0.74 for ARVI), but not for their Aboveground Green Biomass (AGB; R2 > 0.25). The linear models derived from the simulated CASI data strongly predict the marshes' LAI (R2 = 0.93) and AGB (R2 = 0.71) and have 27 and 30 bands/variables in the final models through stepwise regression, respectively. The random forest models derived from the simulated CASI data also strongly predict the marshes' LAI and AGB (R2 = 0.91 and 0.84, respectively), where the most important variables for predicting LAI are near infrared bands at 784 and 756 nm and for predicting ABG are red bands at 684 and 670 nm. In sum, the random forest model is preferable for assessing coastal marsh biomass using CASI data as it offers high R2 for both LAI and AGB. The superior performance of the random forest model is likely to due to that it fully utilizes the full-spectrum data and makes no assumption of the approximate normality of the sampling population. This study offers solutions

  10. A non-parametric Data Envelopment Analysis approach for improving energy efficiency of grape production

    International Nuclear Information System (INIS)

    Khoshroo, Alireza; Mulwa, Richard; Emrouznejad, Ali; Arabi, Behrouz

    2013-01-01

    Grape is one of the world's largest fruit crops with approximately 67.5 million tonnes produced each year and energy is an important element in modern grape productions as it heavily depends on fossil and other energy resources. Efficient use of these energies is a necessary step toward reducing environmental hazards, preventing destruction of natural resources and ensuring agricultural sustainability. Hence, identifying excessive use of energy as well as reducing energy resources is the main focus of this paper to optimize energy consumption in grape production. In this study we use a two-stage methodology to find the association of energy efficiency and performance explained by farmers' specific characteristics. In the first stage a non-parametric Data Envelopment Analysis is used to model efficiencies as an explicit function of human labor, machinery, chemicals, FYM (farmyard manure), diesel fuel, electricity and water for irrigation energies. In the second step, farm specific variables such as farmers' age, gender, level of education and agricultural experience are used in a Tobit regression framework to explain how these factors influence efficiency of grape farming. The result of the first stage shows substantial inefficiency between the grape producers in the studied area while the second stage shows that the main difference between efficient and inefficient farmers was in the use of chemicals, diesel fuel and water for irrigation. The use of chemicals such as insecticides, herbicides and fungicides were considerably less than inefficient ones. The results revealed that the more educated farmers are more energy efficient in comparison with their less educated counterparts. - Highlights: • The focus of this paper is to identify excessive use of energy and optimize energy consumption in grape production. • We measure the efficiency as a function of labor/machinery/chemicals/farmyard manure/diesel-fuel/electricity/water. • Data were obtained from 41 grape

  11. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  12. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  13. Group-wise partial least square regression

    NARCIS (Netherlands)

    Camacho, José; Saccenti, Edoardo

    2018-01-01

    This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are

  14. Multiple Linear Regression: A Realistic Reflector.

    Science.gov (United States)

    Nutt, A. T.; Batsell, R. R.

    Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

  15. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    Science.gov (United States)

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  16. Does Private Tutoring Work? The Effectiveness of Private Tutoring: A Nonparametric Bounds Analysis

    Science.gov (United States)

    Hof, Stefanie

    2014-01-01

    Private tutoring has become popular throughout the world. However, evidence for the effect of private tutoring on students' academic outcome is inconclusive; therefore, this paper presents an alternative framework: a nonparametric bounds method. The present examination uses, for the first time, a large representative data-set in a European setting…

  17. Testing a parametric function against a nonparametric alternative in IV and GMM settings

    DEFF Research Database (Denmark)

    Gørgens, Tue; Wurtz, Allan

    This paper develops a specification test for functional form for models identified by moment restrictions, including IV and GMM settings. The general framework is one where the moment restrictions are specified as functions of data, a finite-dimensional parameter vector, and a nonparametric real ...

  18. A structural nonparametric reappraisal of the CO2 emissions-income relationship

    NARCIS (Netherlands)

    Azomahou, T.T.; Goedhuys - Degelin, Micheline; Nguyen-Van, P.

    Relying on a structural nonparametric estimation, we show that co2 emissions clearly increase with income at low income levels. For higher income levels, we observe a decreasing relationship, though not significant. We also find thatco2 emissions monotonically increases with energy use at a

  19. Nonparametric Estimation of Interval Reliability for Discrete-Time Semi-Markov Systems

    DEFF Research Database (Denmark)

    Georgiadis, Stylianos; Limnios, Nikolaos

    2016-01-01

    In this article, we consider a repairable discrete-time semi-Markov system with finite state space. The measure of the interval reliability is given as the probability of the system being operational over a given finite-length time interval. A nonparametric estimator is proposed for the interval...

  20. Low default credit scoring using two-class non-parametric kernel density estimation

    CSIR Research Space (South Africa)

    Rademeyer, E

    2016-12-01

    Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...

  1. Non-Parametric Bayesian Updating within the Assessment of Reliability for Offshore Wind Turbine Support Structures

    DEFF Research Database (Denmark)

    Ramirez, José Rangel; Sørensen, John Dalsgaard

    2011-01-01

    This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...

  2. Non-parametric production analysis of pesticides use in the Netherlands

    NARCIS (Netherlands)

    Oude Lansink, A.G.J.M.; Silva, E.

    2004-01-01

    Many previous empirical studies on the productivity of pesticides suggest that pesticides are under-utilized in agriculture despite the general held believe that these inputs are substantially over-utilized. This paper uses data envelopment analysis (DEA) to calculate non-parametric measures of the

  3. Analyzing cost efficient production behavior under economies of scope : A nonparametric methodology

    NARCIS (Netherlands)

    Cherchye, L.J.H.; de Rock, B.; Vermeulen, F.M.P.

    2008-01-01

    In designing a production model for firms that generate multiple outputs, we take as a starting point that such multioutput production refers to economies of scope, which in turn originate from joint input use and input externalities. We provide a nonparametric characterization of cost-efficient

  4. Analyzing Cost Efficient Production Behavior Under Economies of Scope : A Nonparametric Methodology

    NARCIS (Netherlands)

    Cherchye, L.J.H.; de Rock, B.; Vermeulen, F.M.P.

    2006-01-01

    In designing a production model for firms that generate multiple outputs, we take as a starting point that such multi-output production refers to economies of scope, which in turn originate from joint input use and input externalities. We provide a nonparametric characterization of cost efficient

  5. A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)

    Science.gov (United States)

    Arenson, Ethan A.; Karabatsos, George

    2017-01-01

    Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…

  6. Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  7. Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods

    DEFF Research Database (Denmark)

    Høg, Esben

    2003-01-01

    In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...

  8. A non-parametric Bayesian approach to decompounding from high frequency data

    NARCIS (Netherlands)

    Gugushvili, Shota; van der Meulen, F.H.; Spreij, Peter

    2016-01-01

    Given a sample from a discretely observed compound Poisson process, we consider non-parametric estimation of the density f0 of its jump sizes, as well as of its intensity λ0. We take a Bayesian approach to the problem and specify the prior on f0 as the Dirichlet location mixture of normal densities.

  9. Scale-Free Nonparametric Factor Analysis: A User-Friendly Introduction with Concrete Heuristic Examples.

    Science.gov (United States)

    Mittag, Kathleen Cage

    Most researchers using factor analysis extract factors from a matrix of Pearson product-moment correlation coefficients. A method is presented for extracting factors in a non-parametric way, by extracting factors from a matrix of Spearman rho (rank correlation) coefficients. It is possible to factor analyze a matrix of association such that…

  10. Nonparametric estimation of the stationary M/G/1 workload distribution function

    DEFF Research Database (Denmark)

    Hansen, Martin Bøgsted

    2005-01-01

    In this paper it is demonstrated how a nonparametric estimator of the stationary workload distribution function of the M/G/1-queue can be obtained by systematic sampling the workload process. Weak convergence results and bootstrap methods for empirical distribution functions for stationary associ...

  11. A non-parametric method for correction of global radiation observations

    DEFF Research Database (Denmark)

    Bacher, Peder; Madsen, Henrik; Perers, Bengt

    2013-01-01

    in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...

  12. Nonparametric estimation in an "illness-death" model when all transition times are interval censored

    DEFF Research Database (Denmark)

    Frydman, Halina; Gerds, Thomas; Grøn, Randi

    2013-01-01

    We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states {0,1,2} from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times ...

  13. Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach

    CERN Document Server

    Boiko, Igor

    2013-01-01

    The relay feedback test (RFT) has become a popular and efficient  tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text.   Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...

  14. A comparative study of non-parametric models for identification of ...

    African Journals Online (AJOL)

    However, the frequency response method using random binary signals was good for unpredicted white noise characteristics and considered the best method for non-parametric system identifica-tion. The autoregressive external input (ARX) model was very useful for system identification, but on applicati-on, few input ...

  15. A non-parametric hierarchical model to discover behavior dynamics from tracks

    NARCIS (Netherlands)

    Kooij, J.F.P.; Englebienne, G.; Gavrila, D.M.

    2012-01-01

    We present a novel non-parametric Bayesian model to jointly discover the dynamics of low-level actions and high-level behaviors of tracked people in open environments. Our model represents behaviors as Markov chains of actions which capture high-level temporal dynamics. Actions may be shared by

  16. Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods - A comparison

    NARCIS (Netherlands)

    Verrelst, Jochem; Rivera, Juan Pablo; Veroustraete, Frank; Muñoz-Marí, Jordi; Clevers, J.G.P.W.; Camps-Valls, Gustau; Moreno, José

    2015-01-01

    Given the forthcoming availability of Sentinel-2 (S2) images, this paper provides a systematic comparison of retrieval accuracy and processing speed of a multitude of parametric, non-parametric and physically-based retrieval methods using simulated S2 data. An experimental field dataset (SPARC),

  17. Parametric and Nonparametric EEG Analysis for the Evaluation of EEG Activity in Young Children with Controlled Epilepsy

    Directory of Open Access Journals (Sweden)

    Vangelis Sakkalis

    2008-01-01

    Full Text Available There is an important evidence of differences in the EEG frequency spectrum of control subjects as compared to epileptic subjects. In particular, the study of children presents difficulties due to the early stages of brain development and the various forms of epilepsy indications. In this study, we consider children that developed epileptic crises in the past but without any other clinical, psychological, or visible neurophysiological findings. The aim of the paper is to develop reliable techniques for testing if such controlled epilepsy induces related spectral differences in the EEG. Spectral features extracted by using nonparametric, signal representation techniques (Fourier and wavelet transform and a parametric, signal modeling technique (ARMA are compared and their effect on the classification of the two groups is analyzed. The subjects performed two different tasks: a control (rest task and a relatively difficult math task. The results show that spectral features extracted by modeling the EEG signals recorded from individual channels by an ARMA model give a higher discrimination between the two subject groups for the control task, where classification scores of up to 100% were obtained with a linear discriminant classifier.

  18. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  19. Recursive Algorithm For Linear Regression

    Science.gov (United States)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  20. A method for nonlinear exponential regression analysis

    Science.gov (United States)

    Junkin, B. G.

    1971-01-01

    A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.

  1. Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting

    Directory of Open Access Journals (Sweden)

    Mayr Andreas

    2012-01-01

    Full Text Available Abstract Background The construction of prediction intervals (PIs for future body mass index (BMI values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. Methods We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. Results The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Conclusions Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.

  2. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

    Science.gov (United States)

    Rights, Jason D; Sterba, Sonya K

    2016-11-01

    Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.

  3. Combining Alphas via Bounded Regression

    Directory of Open Access Journals (Sweden)

    Zura Kakushadze

    2015-11-01

    Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.

  4. Regression in autistic spectrum disorders.

    Science.gov (United States)

    Stefanatos, Gerry A

    2008-12-01

    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  5. Logistic regression applied to natural hazards: rare event logistic regression with replications

    OpenAIRE

    Guns, M.; Vanacker, Veerle

    2012-01-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logisti...

  6. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  7. Estimation of the limit of detection with a bootstrap-derived standard error by a partly non-parametric approach. Application to HPLC drug assays

    DEFF Research Database (Denmark)

    Linnet, Kristian

    2005-01-01

    Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...

  8. Nonparametric evaluation of quantitative traits in population-based association studies when the genetic model is unknown.

    Science.gov (United States)

    Konietschke, Frank; Libiger, Ondrej; Hothorn, Ludwig A

    2012-01-01

    Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible

  9. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  10. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  11. Principal component regression for crop yield estimation

    CERN Document Server

    Suryanarayana, T M V

    2016-01-01

    This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...

  12. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  13. On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

    Directory of Open Access Journals (Sweden)

    Aaditya Ramdas

    2017-01-01

    Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

  14. Nonparametric NAR-ARCH Modelling of Stock Prices by the Kernel Methodology

    Directory of Open Access Journals (Sweden)

    Mohamed Chikhi

    2018-02-01

    Full Text Available This paper analyses cyclical behaviour of Orange stock price listed in French stock exchange over 01/03/2000 to 02/02/2017 by testing the nonlinearities through a class of conditional heteroscedastic nonparametric models. The linearity and Gaussianity assumptions are rejected for Orange Stock returns and informational shocks have transitory effects on returns and volatility. The forecasting results show that Orange stock prices are short-term predictable and nonparametric NAR-ARCH model has better performance over parametric MA-APARCH model for short horizons. Plus, the estimates of this model are also better comparing to the predictions of the random walk model. This finding provides evidence for weak form of inefficiency in Paris stock market with limited rationality, thus it emerges arbitrage opportunities.

  15. Bayesian Non-Parametric Mixtures of GARCH(1,1 Models

    Directory of Open Access Journals (Sweden)

    John W. Lau

    2012-01-01

    Full Text Available Traditional GARCH models describe volatility levels that evolve smoothly over time, generated by a single GARCH regime. However, nonstationary time series data may exhibit abrupt changes in volatility, suggesting changes in the underlying GARCH regimes. Further, the number and times of regime changes are not always obvious. This article outlines a nonparametric mixture of GARCH models that is able to estimate the number and time of volatility regime changes by mixing over the Poisson-Kingman process. The process is a generalisation of the Dirichlet process typically used in nonparametric models for time-dependent data provides a richer clustering structure, and its application to time series data is novel. Inference is Bayesian, and a Markov chain Monte Carlo algorithm to explore the posterior distribution is described. The methodology is illustrated on the Standard and Poor's 500 financial index.

  16. Promotion time cure rate model with nonparametric form of covariate effects.

    Science.gov (United States)

    Chen, Tianlei; Du, Pang

    2018-05-10

    Survival data with a cured portion are commonly seen in clinical trials. Motivated from a biological interpretation of cancer metastasis, promotion time cure model is a popular alternative to the mixture cure rate model for analyzing such data. The existing promotion cure models all assume a restrictive parametric form of covariate effects, which can be incorrectly specified especially at the exploratory stage. In this paper, we propose a nonparametric approach to modeling the covariate effects under the framework of promotion time cure model. The covariate effect function is estimated by smoothing splines via the optimization of a penalized profile likelihood. Point-wise interval estimates are also derived from the Bayesian interpretation of the penalized profile likelihood. Asymptotic convergence rates are established for the proposed estimates. Simulations show excellent performance of the proposed nonparametric method, which is then applied to a melanoma study. Copyright © 2018 John Wiley & Sons, Ltd.

  17. Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures.

    Science.gov (United States)

    Filippi, Sarah; Holmes, Chris C; Nieto-Barajas, Luis E

    2016-11-16

    In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a "null model" of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.

  18. Analysing the length of care episode after hip fracture: a nonparametric and a parametric Bayesian approach.

    Science.gov (United States)

    Riihimäki, Jaakko; Sund, Reijo; Vehtari, Aki

    2010-06-01

    Effective utilisation of limited resources is a challenge for health care providers. Accurate and relevant information extracted from the length of stay distributions is useful for management purposes. Patient care episodes can be reconstructed from the comprehensive health registers, and in this paper we develop a Bayesian approach to analyse the length of care episode after a fractured hip. We model the large scale data with a flexible nonparametric multilayer perceptron network and with a parametric Weibull mixture model. To assess the performances of the models, we estimate expected utilities using predictive density as a utility measure. Since the model parameters cannot be directly compared, we focus on observables, and estimate the relevances of patient explanatory variables in predicting the length of stay. To demonstrate how the use of the nonparametric flexible model is advantageous for this complex health care data, we also study joint effects of variables in predictions, and visualise nonlinearities and interactions found in the data.

  19. A nonparametric empirical Bayes framework for large-scale multiple testing.

    Science.gov (United States)

    Martin, Ryan; Tokdar, Surya T

    2012-07-01

    We propose a flexible and identifiable version of the 2-groups model, motivated by hierarchical Bayes considerations, that features an empirical null and a semiparametric mixture model for the nonnull cases. We use a computationally efficient predictive recursion (PR) marginal likelihood procedure to estimate the model parameters, even the nonparametric mixing distribution. This leads to a nonparametric empirical Bayes testing procedure, which we call PRtest, based on thresholding the estimated local false discovery rates. Simulations and real data examples demonstrate that, compared to existing approaches, PRtest's careful handling of the nonnull density can give a much better fit in the tails of the mixture distribution which, in turn, can lead to more realistic conclusions.

  20. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

    Science.gov (United States)

    Yau, Christopher; Holmes, Chris

    2011-07-01

    We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.