Nonparametric additive regression for repeatedly measured data
Carroll, R. J.
2009-05-20
We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. © 2009 Biometrika Trust.
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Nonparametric Regression with Common Shocks
Directory of Open Access Journals (Sweden)
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Nonparametric regression with martingale increment errors
Delattre, Sylvain
2010-01-01
We consider the problem of adaptive estimation of the regression function in a framework where we replace ergodicity assumptions (such as independence or mixing) by another structural assumption on the model. Namely, we propose adaptive upper bounds for kernel estimators with data-driven bandwidth (Lepski's selection rule) in a regression model where the noise is an increment of martingale. It includes, as very particular cases, the usual i.i.d. regression and auto-regressive models. The cornerstone tool for this study is a new result for self-normalized martingales, called ``stability'', which is of independent interest. In a first part, we only use the martingale increment structure of the noise. We give an adaptive upper bound using a random rate, that involves the occupation time near the estimation point. Thanks to this approach, the theoretical study of the statistical procedure is disconnected from usual ergodicity properties like mixing. Then, in a second part, we make a link with the usual minimax th...
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Anestis Antoniadis
2001-06-01
Full Text Available Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Coverage Accuracy of Confidence Intervals in Nonparametric Regression
Institute of Scientific and Technical Information of China (English)
Song-xi Chen; Yong-song Qin
2003-01-01
Point-wise confidence intervals for a nonparametric regression function with random design points are considered. The confidence intervals are those based on the traditional normal approximation and the empirical likelihood. Their coverage accuracy is assessed by developing the Edgeworth expansions for the coverage probabilities. It is shown that the empirical likelihood confidence intervals are Bartlett correctable.
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes
Directory of Open Access Journals (Sweden)
Biqing Cai
2015-04-01
Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Testing for a constant coefficient of variation in nonparametric regression
Dette, Holger; Marchlewski, Mareen; Wagener, Jens
2010-01-01
In the common nonparametric regression model Y_i=m(X_i)+sigma(X_i)epsilon_i we consider the problem of testing the hypothesis that the coefficient of the scale and location function is constant. The test is based on a comparison of the observations Y_i=\\hat{sigma}(X_i) with their mean by a smoothed empirical process, where \\hat{sigma} denotes the local linear estimate of the scale function. We show weak convergence of a centered version of this process to a Gaussian process under the null ...
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
Fan, Jianqing; Song, Rui
2011-01-01
A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...
Institute of Scientific and Technical Information of China (English)
LINGNeng-xiang; DUXue-qiao
2005-01-01
In this paper, we study the strong consistency for partitioning estimation of regression function under samples that axe φ-mixing sequences with identically distribution.Key words: nonparametric regression function; partitioning estimation; strong convergence;φ-mixing sequences.
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Wei, Jiawei; Carroll, Raymond J; Maity, Arnab
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Semi-parametric regression: Efficiency gains from modeling the nonparametric part
Yu, Kyusang; Park, Byeong U; 10.3150/10-BEJ296
2011-01-01
It is widely admitted that structured nonparametric modeling that circumvents the curse of dimensionality is important in nonparametric estimation. In this paper we show that the same holds for semi-parametric estimation. We argue that estimation of the parametric component of a semi-parametric model can be improved essentially when more structure is put into the nonparametric part of the model. We illustrate this for the partially linear model, and investigate efficiency gains when the nonparametric part of the model has an additive structure. We present the semi-parametric Fisher information bound for estimating the parametric part of the partially linear additive model and provide semi-parametric efficient estimators for which we use a smooth backfitting technique to deal with the additive nonparametric part. We also present the finite sample performances of the proposed estimators and analyze Boston housing data as an illustration.
Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function
Seijo, Emilio
2010-01-01
This paper deals with the consistency of the least squares estimator of a convex regression function when the predictor is multidimensional. We characterize and discuss the computation of such an estimator via the solution of certain quadratic and linear programs. Mild sufficient conditions for the consistency of this estimator and its subdifferentials in fixed and stochastic design regression settings are provided. We also consider a regression function which is known to be convex and componentwise nonincreasing and discuss the characterization, computation and consistency of its least squares estimator.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
Fan, Jianqing; Feng, Yang; Song, Rui
2011-06-01
A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
Institute of Scientific and Technical Information of China (English)
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Dai, Wenlin
2017-09-01
Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Directory of Open Access Journals (Sweden)
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
Institute of Scientific and Technical Information of China (English)
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Energy Technology Data Exchange (ETDEWEB)
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting
Directory of Open Access Journals (Sweden)
Jelena Fiosina
2017-01-01
Full Text Available Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers in the distributed realization.
Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model
Directory of Open Access Journals (Sweden)
Yujuan Sun
2014-01-01
Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
Institute of Scientific and Technical Information of China (English)
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
LSTA, Rawane Samb
2010-01-01
This thesis deals with the nonparametric estimation of density f of the regression error term E of the model Y=m(X)+E, assuming its independence with the covariate X. The difficulty linked to this study is the fact that the regression error E is not observed. In a such setup, it would be unwise, for estimating f, to use a conditional approach based upon the probability distribution function of Y given X. Indeed, this approach is affected by the curse of dimensionality, so that the resulting estimator of the residual term E would have considerably a slow rate of convergence if the dimension of X is very high. Two approaches are proposed in this thesis to avoid the curse of dimensionality. The first approach uses the estimated residuals, while the second integrates a nonparametric conditional density estimator of Y given X. If proceeding so can circumvent the curse of dimensionality, a challenging issue is to evaluate the impact of the estimated residuals on the final estimator of the density f. We will also at...
Zhu, Feng; Feng, Weiyue; Wang, Huajian; Huang, Shaosen; Lv, Yisong; Chen, Yong
2013-01-01
X-ray spectral imaging provides quantitative imaging of trace elements in biological sample with high sensitivity. We propose a novel algorithm to promote the signal-to-noise ratio (SNR) of X-ray spectral images that have low photon counts. Firstly, we estimate the image data area that belongs to the homogeneous parts through confidence interval testing. Then, we apply the Poisson regression through its maximum likelihood estimation on this area to estimate the true photon counts from the Poisson noise corrupted data. Unlike other denoising methods based on regression analysis, we use the bootstrap resampling methods to ensure the accuracy of regression estimation. Finally, we use a robust local nonparametric regression method to estimate the baseline and subsequently subtract it from the X-ray spectral data to further improve the SNR of the data. Experiments on several real samples show that the proposed method performs better than some state-of-the-art approaches to ensure accuracy and precision for quantit...
Forecasting of Households Consumption Expenditure with Nonparametric Regression: The Case of Turkey
Directory of Open Access Journals (Sweden)
Aydin Noyan
2016-11-01
Full Text Available The relationship between household income and expenditure is important for understanding how the shape of the economic dynamics of the households. In this study, the relationship between household consumption expenditure and household disposable income were analyzed by Locally Weighted Scatterplot Smoothing Regression which is a nonparametric method using R programming. This study aimed to determine relationship between variables directly, unlike making any assumptions are commonly used as in the conventional parametric regression. According to the findings, effect on expenditure with increasing of income and household size together increased rapidly at first, and then speed of increase decreased. This increase can be explained by having greater compulsory consumption expenditure relatively in small households. Besides, expenditure is relatively higher in middle and high income levels according to low income level. However, the change in expenditure is limited in middle and is the most limited in high income levels when household size changes.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Carroll, Raymond J.
2011-03-01
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Fast pixel-based optical proximity correction based on nonparametric kernel regression
Ma, Xu; Wu, Bingliang; Song, Zhiyang; Jiang, Shangliang; Li, Yanqiu
2014-10-01
Optical proximity correction (OPC) is a resolution enhancement technique extensively used in the semiconductor industry to improve the resolution and pattern fidelity of optical lithography. In pixel-based OPC (PBOPC), the layout is divided into small pixels, which are then iteratively modified until the simulated print image on the wafer matches the desired pattern. However, the increasing complexity and size of modern integrated circuits make PBOPC techniques quite computationally intensive. This paper focuses on developing a practical and efficient PBOPC algorithm based on a nonparametric kernel regression, a well-known technique in machine learning. Specifically, we estimate the OPC patterns based on the geometric characteristics of the original layout corresponding to the same region and a series of training examples. Experimental results on metal layers show that our proposed approach significantly improves the speed of a current professional PBOPC software by a factor of 2 to 3, and may further reduce the mask complexity.
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines
Kuter, S.; Akyürek, Z.; Weber, G.-W.
2016-10-01
Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th
Revisiting the Distance Duality Relation using a non-parametric regression method
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
Subpixel Snow Cover Mapping from MODIS Data by Nonparametric Regression Splines
Akyurek, Z.; Kuter, S.; Weber, G. W.
2016-12-01
Spatial extent of snow cover is often considered as one of the key parameters in climatological, hydrological and ecological modeling due to its energy storage, high reflectance in the visible and NIR regions of the electromagnetic spectrum, significant heat capacity and insulating properties. A significant challenge in snow mapping by remote sensing (RS) is the trade-off between the temporal and spatial resolution of satellite imageries. In order to tackle this issue, machine learning-based subpixel snow mapping methods, like Artificial Neural Networks (ANNs), from low or moderate resolution images have been proposed. Multivariate Adaptive Regression Splines (MARS) is a nonparametric regression tool that can build flexible models for high dimensional and complex nonlinear data. Although MARS is not often employed in RS, it has various successful implementations such as estimation of vertical total electron content in ionosphere, atmospheric correction and classification of satellite images. This study is the first attempt in RS to evaluate the applicability of MARS for subpixel snow cover mapping from MODIS data. Total 16 MODIS-Landsat ETM+ image pairs taken over European Alps between March 2000 and April 2003 were used in the study. MODIS top-of-atmospheric reflectance, NDSI, NDVI and land cover classes were used as predictor variables. Cloud-covered, cloud shadow, water and bad-quality pixels were excluded from further analysis by a spatial mask. MARS models were trained and validated by using reference fractional snow cover (FSC) maps generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also developed. The mutual comparison of obtained MARS and ANN models was accomplished on independent test areas. The MARS model performed better than the ANN model with an average RMSE of 0.1288 over the independent test areas; whereas the average RMSE of the ANN model
Asymmetry Effects in Chinese Stock Markets Volatility: A Generalized Additive Nonparametric Approach
Hou, Ai Jun
2007-01-01
The unique characteristics of the Chinese stock markets make it difficult to assume a particular distribution for innovations in returns and the specification form of the volatility process when modeling return volatility with the parametric GARCH family models. This paper therefore applies a generalized additive nonparametric smoothing technique to examine the volatility of the Chinese stock markets. The empirical results indicate that an asymmetric effect of negative news exists in the Chin...
Subgroup finding via Bayesian additive regression trees.
Sivaganesan, Siva; Müller, Peter; Huang, Bin
2017-03-09
We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.
Montiel, Ariadna; Sendra, Irene; Escamilla-Rivera, Celia; Salzano, Vincenzo
2014-01-01
In this work we present a nonparametric approach, which works on minimal assumptions, to reconstruct the cosmic expansion of the Universe. We propose to combine a locally weighted scatterplot smoothing method and a simulation-extrapolation method. The first one (Loess) is a nonparametric approach that allows to obtain smoothed curves with no prior knowledge of the functional relationship between variables nor of the cosmological quantities. The second one (Simex) takes into account the effect of measurement errors on a variable via a simulation process. For the reconstructions we use as raw data the Union2.1 Type Ia Supernovae compilation, as well as recent Hubble parameter measurements. This work aims to illustrate the approach, which turns out to be a self-sufficient technique in the sense we do not have to choose anything by hand. We examine the details of the method, among them the amount of observational data needed to perform the locally weighted fit which will define the robustness of our reconstructio...
A Level Set Analysis and A Nonparametric Regression on S&P 500 Daily Return
Directory of Open Access Journals (Sweden)
Yipeng Yang
2016-02-01
Full Text Available In this paper, a level set analysis is proposed which aims to analyze the S&P 500 return with a certain magnitude. It is found that the process of large jumps/drops of return tend to have negative serial correlation, and volatility clustering phenomenon can be easily seen. Then, a nonparametric analysis is performed and new patterns are discovered. An ARCH model is constructed based on the patterns we discovered and it is capable of manifesting the volatility skew in option pricing. A comparison of our model with the GARCH(1,1 model is carried out. The explanation of the validity on our model through prospect theory is provided, and, as a novelty, we linked the volatility skew phenomenon to the prospect theory in behavioral finance.
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result. PMID:28133490
Directory of Open Access Journals (Sweden)
Yue Fan
2017-01-01
Full Text Available Gene regulatory networks (GRNs play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Additive Hazard Regression Models: An Application to the Natural History of Human Papillomavirus
Directory of Open Access Journals (Sweden)
Xianhong Xie
2013-01-01
Full Text Available There are several statistical methods for time-to-event analysis, among which is the Cox proportional hazards model that is most commonly used. However, when the absolute change in risk, instead of the risk ratio, is of primary interest or when the proportional hazard assumption for the Cox proportional hazards model is violated, an additive hazard regression model may be more appropriate. In this paper, we give an overview of this approach and then apply a semiparametric as well as a nonparametric additive model to a data set from a study of the natural history of human papillomavirus (HPV in HIV-positive and HIV-negative women. The results from the semiparametric model indicated on average an additional 14 oncogenic HPV infections per 100 woman-years related to CD4 count < 200 relative to HIV-negative women, and those from the nonparametric additive model showed an additional 40 oncogenic HPV infections per 100 women over 5 years of followup, while the estimated hazard ratio in the Cox model was 3.82. Although the Cox model can provide a better understanding of the exposure disease association, the additive model is often more useful for public health planning and intervention.
Du, Li; Turner, Jay
2015-10-01
A long term air quality study is being conducted in Roxana, Illinois, USA, at the fenceline of a petroleum refinery. Measurements include 1-in-6 day 24-hour integrated ambient fine particulate matter (PM2.5) speciation following the Chemical Speciation Network (CSN) sampling and analysis protocols. Lanthanoid elements, some of which are tracers of fluidized-bed catalytic cracker (FCC) emissions, are also measured by inductively coupled plasma-mass spectrometry (ICP-MS) after extraction from PM2.5 using hot block-assisted acid digestion. Lanthanoid recoveries of 80-90% were obtained for two ambient particulate matter standard reference materials (NIST SRM 1648a and 2783). Ambient PM2.5 La patterns could be explained by a two-source model representing resuspended soil and FCC emissions with enhanced La/Ce ratios when impacted by the refinery. Nonparametric wind regression demonstrates that when the monitoring station was upwind of the refinery the mean La/Ce ratio is consistent with soil and when the monitoring station is downwind of the refinery the mean ratio is more than four times higher for bearings that corresponds to maximum impacts. Source apportionment modeling using EPA UNMIX and EPA PMF could not reliably apportion PM2.5 mass to the FCC emissions. However, the weight of evidence is that such contributions are small with no large episodes observed for the 164 samples analyzed. This study demonstrates the applicability of a hot block-assisted digestion protocol for the extraction of lanthanoid elements as well as insights obtained from long-term monitoring data including wind direction-based analyses.
Parametric and Non-Parametric System Modelling
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
Fitting Additive Binomial Regression Models with the R Package blm
Directory of Open Access Journals (Sweden)
Stephanie Kovalchik
2013-09-01
Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.
Analysing inequalities in Germany a structured additive distributional regression approach
Silbersdorff, Alexander
2017-01-01
This book seeks new perspectives on the growing inequalities that our societies face, putting forward Structured Additive Distributional Regression as a means of statistical analysis that circumvents the common problem of analytical reduction to simple point estimators. This new approach allows the observed discrepancy between the individuals’ realities and the abstract representation of those realities to be explicitly taken into consideration using the arithmetic mean alone. In turn, the method is applied to the question of economic inequality in Germany.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Semi-Supervised Additive Logistic Regression: A Gradient Descent Solution
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
This paper describes a semi-supervised regularized method for additive logistic regression. The graph regularization term of the combined functions is added to the original cost functional used in AdaBoost. This term constrains the learned function to be smooth on a graph. Then the gradient solution is computed with the advantage that the regularization parameter can be adaptively selected. Finally, the function step-size of each iteration can be computed using Newton-Raphson iteration. Experiments on benchmark data sets show that the algorithm gives better results than existing methods.
Wishart, Justin Rory
2011-01-01
In this paper, a lower bound is determined in the minimax sense for change point estimators of the first derivative of a regression function in the fractional white noise model. Similar minimax results presented previously in the area focus on change points in the derivatives of a regression function in the white noise model or consider estimation of the regression function in the presence of correlated errors.
DEFF Research Database (Denmark)
Scheike, Thomas Harder
2002-01-01
We use the additive risk model of Aalen (Aalen, 1980) as a model for the rate of a counting process. Rather than specifying the intensity, that is the instantaneous probability of an event conditional on the entire history of the relevant covariates and counting processes, we present a model...... for the rate function, i.e., the instantaneous probability of an event conditional on only a selected set of covariates. When the rate function for the counting process is of Aalen form we show that the usual Aalen estimator can be used and gives almost unbiased estimates. The usual martingale based variance...... estimator is incorrect and an alternative estimator should be used. We also consider the semi-parametric version of the Aalen model as a rate model (McKeague and Sasieni, 1994) and show that the standard errors that are computed based on an assumption of intensities are incorrect and give a different...
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
Yerlikaya-Özkurt, Fatma; Askan, Aysegul; Weber, Gerhard-Wilhelm
2014-12-01
Ground Motion Prediction Equations (GMPEs) are empirical relationships which are used for determining the peak ground response at a particular distance from an earthquake source. They relate the peak ground responses as a function of earthquake source type, distance from the source, local site conditions where the data are recorded and finally the depth and magnitude of the earthquake. In this article, a new prediction algorithm, called Conic Multivariate Adaptive Regression Splines (CMARS), is employed on an available dataset for deriving a new GMPE. CMARS is based on a special continuous optimization technique, conic quadratic programming. These convex optimization problems are very well-structured, resembling linear programs and, hence, permitting the use of interior point methods. The CMARS method is performed on the strong ground motion database of Turkey. Results are compared with three other GMPEs. CMARS is found to be effective for ground motion prediction purposes.
Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G; Lado, María J
2009-01-30
In many biomedical applications, interest lies in being able to distinguish between two possible states of a given response variable, depending on the values of certain continuous predictors. If the number of predictors, p, is high, or if there is redundancy among them, it then becomes important to decide on the selection of the best subset of predictors that will be able to obtain the models with greatest discrimination capacity. With this aim in mind, logistic generalized additive models were considered and receiver operating characteristic (ROC) curves were applied in order to determine and compare the discriminatory capacity of such models. This study sought to develop bootstrap-based tests that allow for the following to be ascertained: (a) the optimal number q < or = p of predictors; and (b) the model or models including q predictors, which display the largest AUC (area under the ROC curve). A simulation study was conducted to verify the behaviour of these tests. Finally, the proposed method was applied to a computer-aided diagnostic system dedicated to early detection of breast cancer. Copyright (c) 2008 John Wiley & Sons, Ltd.
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Marginal longitudinal semiparametric regression via penalized splines
Al Kadiri, M.
2010-08-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Marginal longitudinal semiparametric regression via penalized splines.
Kadiri, M Al; Carroll, R J; Wand, M P
2010-08-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
Institute of Scientific and Technical Information of China (English)
赵文芝; 田铮; 夏志明
2009-01-01
A wavelet method of detection and estimation of change points in nonparametric regression models under random design is proposed.The confidence bound of our test is derived by using the test statistics based on empirical wavelet coefficients as obtained by wavelet transformation of the data which is observed with noise.Moreover,the consistence of the test is proved while the rate of convergence is given.The method turns out to be effective after being tested on simulated examples and applied to IBM stock market data.
Combining an additive and tree-based regression model simultaneously: STIMA
Dusseldorp, E.; Conversano, C.; Os, B.J. van
2010-01-01
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as
Neelakantan, S; Veng-Pedersen, P
2005-11-01
A novel numerical deconvolution method is presented that enables the estimation of drug absorption rates under time-variant disposition conditions. The method involves two components. (1) A disposition decomposition-recomposition (DDR) enabling exact changes in the unit impulse response (UIR) to be constructed based on centrally based clearance changes iteratively determined. (2) A non-parametric, end-constrained cubic spline (ECS) input response function estimated by cross-validation. The proposed DDR-ECS method compensates for disposition changes between the test and the reference administrations by using a "beta" clearance correction based on DDR analysis. The representation of the input response by the ECS method takes into consideration the complex absorption process and also ensures physiologically realistic approximations of the response. The stability of the new method to noisy data was evaluated by comprehensive simulations that considered different UIRs, various input functions, clearance changes and a novel scaling of the input function that includes the "flip-flop" absorption phenomena. The simulated input response was also analysed by two other methods and all three methods were compared for their relative performances. The DDR-ECS method provides better estimation of the input profile under significant clearance changes but tends to overestimate the input when there were only small changes in the clearance.
随机右删失非参数回归模型的影响分析%Influence Analysis of Non-parametric Regression Model with Random Right Censorship
Institute of Scientific and Technical Information of China (English)
王淑玲; 冯予; 刘刚
2012-01-01
In this paper, the primary model is transformed to non-parametric regression model; Then, local influence is discussed and concise influence matrix is obtained; At last, example is given to illustrate our results.%将随机删失非参数固定设计回归模型转化为非参数回归模型进行研究；然后对此模型作了局部影响分析,得到计算影响矩阵及最大影响曲率方向的简洁公式；最后通过实例分析,验证了分析方法的有效性.
Styborski, Jeremy A.
This project was started in the interest of supplementing existing data on additives to composite solid propellants. The study on the addition of iron and aluminum nanoparticles to composite AP/HTPB propellants was conducted at the Combustion and Energy Systems Laboratory at RPI in the new strand-burner experiment setup. For this study, a large literature review was conducted on history of solid propellant combustion modeling and the empirical results of tests on binders, plasticizers, AP particle size, and additives. The study focused on the addition of nano-scale aluminum and iron in small concentrations to AP/HTPB solid propellants with an average AP particle size of 200 microns. Replacing 1% of the propellant's AP with 40-60 nm aluminum particles produced no change in combustive behavior. The addition of 1% 60-80 nm iron particles produced a significant increase in burn rate, although the increase was lesser at higher pressures. These results are summarized in Table 2. The increase in the burn rate at all pressures due to the addition of iron nanoparticles warranted further study on the effect of concentration of iron. Tests conducted at 10 atm showed that the mean regression rate varied with iron concentration, peaking at 1% and 3%. Regardless of the iron concentration, the regression rate was higher than the baseline AP/HTPB propellants. These results are summarized in Table 3.
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
A class of additive-accelerated means regression models for recurrent event data
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
In this article, we propose a class of additive-accelerated means regression models for analyzing recurrent event data. The class includes the proportional means model, the additive rates model, the accelerated failure time model, the accelerated rates model and the additive-accelerated rate model as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the model parameters, estimating equation approaches are derived and asymptotic properties of the proposed estimators are established. In addition, a technique is provided for model checking. The finite-sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is illustrated.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing
Energy Technology Data Exchange (ETDEWEB)
Kamath, C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Fan, Y. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2017-04-11
There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. While it is easy to collect such large data sets in some application domains, there are others where collecting even a single data point can be very expensive, so the resulting data sets have only tens or hundreds of samples. For example, when complex computer simulations are used to understand a scientific phenomenon, we want to run the simulation for many different values of the input parameters and analyze the resulting output. The data set relating the simulation inputs and outputs is typically quite small, especially when each run of the simulation is expensive. However, regression techniques can still be used on such data sets to build an inexpensive \\surrogate" that could provide an approximate output for a given set of inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments. In this paper, we compare different regression techniques to determine how well they predict melt-pool characteristics in the problem domain of additive manufacturing. Our analysis indicates that some of the commonly used regression methods do perform quite well even on small data sets.
Nonparametric Econometrics: The np Package
Directory of Open Access Journals (Sweden)
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
Lin, Feng-Chang; Zhu, Jun
2012-01-01
We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Forecasting Uncertainty in Electricity Smart Meter Data by Boosting Additive Quantile Regression
Taieb, Souhaib Ben
2016-03-02
Smart electricity meters are currently deployed in millions of households to collect detailed individual electricity consumption data. Compared with traditional electricity data based on aggregated consumption, smart meter data are much more volatile and less predictable. There is a need within the energy industry for probabilistic forecasts of household electricity consumption to quantify the uncertainty of future electricity demand in order to undertake appropriate planning of generation and distribution. We propose to estimate an additive quantile regression model for a set of quantiles of the future distribution using a boosting procedure. By doing so, we can benefit from flexible and interpretable models, which include an automatic variable selection. We compare our approach with three benchmark methods on both aggregated and disaggregated scales using a smart meter data set collected from 3639 households in Ireland at 30-min intervals over a period of 1.5 years. The empirical results demonstrate that our approach based on quantile regression provides better forecast accuracy for disaggregated demand, while the traditional approach based on a normality assumption (possibly after an appropriate Box-Cox transformation) is a better approximation for aggregated demand. These results are particularly useful since more energy data will become available at the disaggregated level in the future.
非参数回归中方差变点的小波检测%Detection of Change Points in Volatility of Non-Parametric Regression by Wavelets
Institute of Scientific and Technical Information of China (English)
王景乐; 郑明
2012-01-01
This paper studies the detection and estimation of change points in volatility under nonparametric regression models.Wavelet methods are applied to construct the test statistics which can be used to detect change points in volatility.The asymptotic distributions of the test statistics are established.We also utilize the test statistics to construct the estimators for the locations and jump sizes of the change points in volatility.The asymptotic properties of these estimators are derived.Some simulation studies are conducted to assess the finite sample performance of the proposed procedures.%本文主要研究了非参数回归模型中方差函数的变点,利用小波方法构造的检验量来检测方差中的变点,建立了这些检验量的渐近分布,并且运用这些检验量构造了方差变点的位置和跳跃幅度的估计,给出了这些估计的渐近性质,并进一步通过随机模拟验证了本文方法在有限样本下的性质.
Habyarimana, Faustin; Zewotir, Temesgen; Ramroop, Shaun
2017-06-17
Childhood anemia is among the most significant health problems faced by public health departments in developing countries. This study aims at assessing the determinants and possible spatial effects associated with childhood anemia in Rwanda. The 2014/2015 Rwanda Demographic and Health Survey (RDHS) data was used. The analysis was done using the structured spatial additive quantile regression model. The findings of this study revealed that the child's age; the duration of breastfeeding; gender of the child; the nutritional status of the child (whether underweight and/or wasting); whether the child had a fever; had a cough in the two weeks prior to the survey or not; whether the child received vitamin A supplementation in the six weeks before the survey or not; the household wealth index; literacy of the mother; mother's anemia status; mother's age at the birth are all significant factors associated with childhood anemia in Rwanda. Furthermore, significant structured spatial location effects on childhood anemia was found.
High-dimensional regression with unknown variance
Giraud, Christophe; Verzelen, Nicolas
2011-01-01
We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasize is put on non-asymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the Lasso esti- mator and some references are collected for some more general models, including multivariate regression and nonparametric regression.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Quantifying spatial disparities in neonatal mortality using a structured additive regression model.
Directory of Open Access Journals (Sweden)
Lawrence N Kazembe
Full Text Available BACKGROUND: Neonatal mortality contributes a large proportion towards early childhood mortality in developing countries, with considerable geographical variation at small areas within countries. METHODS: A geo-additive logistic regression model is proposed for quantifying small-scale geographical variation in neonatal mortality, and to estimate risk factors of neonatal mortality. Random effects are introduced to capture spatial correlation and heterogeneity. The spatial correlation can be modelled using the Markov random fields (MRF when data is aggregated, while the two dimensional P-splines apply when exact locations are available, whereas the unstructured spatial effects are assigned an independent Gaussian prior. Socio-economic and bio-demographic factors which may affect the risk of neonatal mortality are simultaneously estimated as fixed effects and as nonlinear effects for continuous covariates. The smooth effects of continuous covariates are modelled by second-order random walk priors. Modelling and inference use the empirical Bayesian approach via penalized likelihood technique. The methodology is applied to analyse the likelihood of neonatal deaths, using data from the 2000 Malawi demographic and health survey. The spatial effects are quantified through MRF and two dimensional P-splines priors. RESULTS: Findings indicate that both fixed and spatial effects are associated with neonatal mortality. CONCLUSIONS: Our study, therefore, suggests that the challenge to reduce neonatal mortality goes beyond addressing individual factors, but also require to understanding unmeasured covariates for potential effective interventions.
Structured Additive Quantile Regression for Assessing the Determinants of Childhood Anemia in Rwanda
Directory of Open Access Journals (Sweden)
Faustin Habyarimana
2017-06-01
Full Text Available Childhood anemia is among the most significant health problems faced by public health departments in developing countries. This study aims at assessing the determinants and possible spatial effects associated with childhood anemia in Rwanda. The 2014/2015 Rwanda Demographic and Health Survey (RDHS data was used. The analysis was done using the structured spatial additive quantile regression model. The findings of this study revealed that the child’s age; the duration of breastfeeding; gender of the child; the nutritional status of the child (whether underweight and/or wasting; whether the child had a fever; had a cough in the two weeks prior to the survey or not; whether the child received vitamin A supplementation in the six weeks before the survey or not; the household wealth index; literacy of the mother; mother’s anemia status; mother’s age at the birth are all significant factors associated with childhood anemia in Rwanda. Furthermore, significant structured spatial location effects on childhood anemia was found.
Institute of Scientific and Technical Information of China (English)
赵文芝; 夏志明; 贺飞跃
2016-01-01
The two-step estimators for change point in nonparametric regression are proposed.In the first step,an initial estimator is obtained by local linear smoothing method.In the second step,the fi-nal estimator is obtained by CUSUM method on a closed neighborhood of initial estimator.It is found through a simulation study that the proposed estimator is efficient.The estimator for j ump size is also obtained.Further more,experimental results that using historical data on Nile river discharges,ex-change rate data of USD against RMB and global temperature data for the northern hemisphere show that the proposed method is also practical in applications.%针对非参数回归模型变点问题，给出了变点的两步估计方法。第一步，用局部线性方法给出变点的初始估计量；第二步，在初始估计量的邻域内，用 CUSUM方法给出变点的最终估计量，同时获得了变点跃度的估计量。蒙特卡罗随机模拟结果表明了此方法的有效性。最后以尼罗河流量数据，美元兑换人民币汇率数据以及北半球月平均气温数据为例进行分析，结果说明此方法有实际应用价值。
Combined parametric-nonparametric identification of block-oriented systems
Mzyk, Grzegorz
2014-01-01
This book considers a problem of block-oriented nonlinear dynamic system identification in the presence of random disturbances. This class of systems includes various interconnections of linear dynamic blocks and static nonlinear elements, e.g., Hammerstein system, Wiener system, Wiener-Hammerstein ("sandwich") system and additive NARMAX systems with feedback. Interconnecting signals are not accessible for measurement. The combined parametric-nonparametric algorithms, proposed in the book, can be selected dependently on the prior knowledge of the system and signals. Most of them are based on the decomposition of the complex system identification task into simpler local sub-problems by using non-parametric (kernel or orthogonal) regression estimation. In the parametric stage, the generalized least squares or the instrumental variables technique is commonly applied to cope with correlated excitations. Limit properties of the algorithms have been shown analytically and illustrated in simple experiments.
Additional results on 'Reducing geometric dilution of precision using ridge regression'
Kelly, Robert J.
1990-07-01
Kelly (1990) presented preliminary results on the feasibility of using ridge regression (RR) to reduce the effects of geometric dilution of precision (GDOP) error inflation in position-fix navigation systems. Recent results indicate that RR will not reduce GDOP bias inflation when biaslike measurement errors last much longer than the aircraft guidance-loop response time. This conclusion precludes the use of RR on navigation systems whose dominant error sources are biaslike; e.g., the GPS selective-availability error source. The simulation results given by Kelly are, however, valid for the conditions defined. Although RR has not yielded a satisfactory solution to the general GDOP problem, it has illuminated the role that multicollinearity plays in navigation signal processors such as the Kalman filter. Bias inflation, initial position guess errors, ridge-parameter selection methodology, and the recursive ridge filter are discussed.
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation
Directory of Open Access Journals (Sweden)
Ahmad Bilfarsah
2005-04-01
Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Directory of Open Access Journals (Sweden)
Nora Fenske
Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Testing for additivity with B-splines
Institute of Scientific and Technical Information of China (English)
Heng-jian CUI; Xu-ming HE; Li LIU
2007-01-01
Regression splines are often used for fitting nonparametric functions, and they work especially well for additivity models. In this paper, we consider two simple tests of additivity: an adaptation of Tukey's one degree of freedom test and a nonparametric version of Rao's score test. While the Tukey-type test can detect most forms of the local non-additivity at the parametric rate of O(n-1/2), the score test is consistent for all alternative at a nonparametric rate. The asymptotic distribution of these test statistics is derived under both the null and local alternative hypotheses. A simulation study is conducted to compare their finite-sample performances with some existing kernelbased tests. The score test is found to have a good overall performance.
Testing for additivity with B-splines
Institute of Scientific and Technical Information of China (English)
2007-01-01
Regression splines are often used for fitting nonparametric functions, and they work especially well for additivity models. In this paper, we consider two simple tests of additivity: an adaptation of Tukey’s one degree of freedom test and a nonparametric version of Rao’s score test. While the Tukey-type test can detect most forms of the local non-additivity at the parametric rate of O(n-1/2), the score test is consistent for all alternative at a nonparametric rate. The asymptotic distribution of these test statistics is derived under both the null and local alternative hypotheses. A simulation study is conducted to compare their finite-sample performances with some existing kernel-based tests. The score test is found to have a good overall performance.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
Uniform Consistency for Nonparametric Estimators in Null Recurrent Time Series
DEFF Research Database (Denmark)
Gao, Jiti; Kanaya, Shin; Li, Degui
2015-01-01
This paper establishes uniform consistency results for nonparametric kernel density and regression estimators when time series regressors concerned are nonstationary null recurrent Markov chains. Under suitable regularity conditions, we derive uniform convergence rates of the estimators. Our...... results can be viewed as a nonstationary extension of some well-known uniform consistency results for stationary time series....
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Directory of Open Access Journals (Sweden)
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
Semi- and Nonparametric ARCH Processes
Directory of Open Access Journals (Sweden)
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Directory of Open Access Journals (Sweden)
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Nonparametric estimation of ultrasound pulses
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
Preliminary results on nonparametric facial occlusion detection
Directory of Open Access Journals (Sweden)
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Semiparametric Additive Transformation Model under Current Status Data
Cheng, Guang
2011-01-01
We consider the efficient estimation of the semiparametric additive transformation model with current status data. A wide range of survival models and econometric models can be incorporated into this general transformation framework. We apply the B-spline approach to simultaneously estimate the linear regression vector, the nondecreasing transformation function, and a set of nonparametric regression functions. We show that the parametric estimate is semiparametric efficient in the presence of multiple nonparametric nuisance functions. An explicit consistent B-spline estimate of the asymptotic variance is also provided. All nonparametric estimates are smooth, and shown to be uniformly consistent and have faster than cubic rate of convergence. Interestingly, we observe the convergence rate interfere phenomenon, i.e., the convergence rates of B-spline estimators are all slowed down to equal the slowest one. The constrained optimization is not required in our implementation. Numerical results are used to illustra...
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Directory of Open Access Journals (Sweden)
Omid Hamidi
2014-01-01
Full Text Available Microarray technology results in high-dimensional and low-sample size data sets. Therefore, fitting sparse models is substantial because only a small number of influential genes can reliably be identified. A number of variable selection approaches have been proposed for high-dimensional time-to-event data based on Cox proportional hazards where censoring is present. The present study applied three sparse variable selection techniques of Lasso, smoothly clipped absolute deviation and the smooth integration of counting, and absolute deviation for gene expression survival time data using the additive risk model which is adopted when the absolute effects of multiple predictors on the hazard function are of interest. The performances of used techniques were evaluated by time dependent ROC curve and bootstrap .632+ prediction error curves. The selected genes by all methods were highly significant (P<0.001. The Lasso showed maximum median of area under ROC curve over time (0.95 and smoothly clipped absolute deviation showed the lowest prediction error (0.105. It was observed that the selected genes by all methods improved the prediction of purely clinical model indicating the valuable information containing in the microarray features. So it was concluded that used approaches can satisfactorily predict survival based on selected gene expression measurements.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
Pivotal Estimation of Nonparametric Functions via Square-root Lasso
Belloni, Alexandre; Wang, Lie
2011-01-01
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter $\\sigma$ of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, $\\ell_1$-rate of converge, $\\ell_\\infty$-rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild con...
Directory of Open Access Journals (Sweden)
Paulo Canas Rodrigues
2011-12-01
Full Text Available This paper joins the main properties of joint regression analysis (JRA, a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI model. The study compares JRA and AMMI with particular focus on robustness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA and winner of mega-environments (AMMI for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
-Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Nonparametric Bayesian Modeling of Complex Networks
DEFF Research Database (Denmark)
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
An asymptotically optimal nonparametric adaptive controller
Institute of Scientific and Technical Information of China (English)
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W
2009-02-01
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.
Semiparametric regression during 2003–2007
Ruppert, David
2009-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
Parametrically guided estimation in nonparametric varying coefficient models with quasi-likelihood.
Davenport, Clemontina A; Maity, Arnab; Wu, Yichao
2015-04-01
Varying coefficient models allow us to generalize standard linear regression models to incorporate complex covariate effects by modeling the regression coefficients as functions of another covariate. For nonparametric varying coefficients, we can borrow the idea of parametrically guided estimation to improve asymptotic bias. In this paper, we develop a guided estimation procedure for the nonparametric varying coefficient models. Asymptotic properties are established for the guided estimators and a method of bandwidth selection via bias-variance tradeoff is proposed. We compare the performance of the guided estimator with that of the unguided estimator via both simulation and real data examples.
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
Panel data nonparametric estimation of production risk and risk preferences
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...... approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price...
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Bayesian nonparametric duration model with censorship
Directory of Open Access Journals (Sweden)
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
Estimation of Stochastic Volatility Models by Nonparametric Filtering
DEFF Research Database (Denmark)
Kanaya, Shin; Kristensen, Dennis
2016-01-01
/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Nonparametric methods in actigraphy: An update
Directory of Open Access Journals (Sweden)
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Do Former College Athletes Earn More at Work? A Nonparametric Assessment
Henderson, Daniel J.; Olbrecht, Alexandre; Polachek, Solomon W.
2006-01-01
This paper investigates how students' collegiate athletic participation affects their subsequent labor market success. By using newly developed techniques in nonparametric regression, it shows that on average former college athletes earn a wage premium. However, the premium is not uniform, but skewed so that more than half the athletes actually…
Nonparametric Forecasting for Biochar Utilization in Poyang Lake Eco-Economic Zone in China
Directory of Open Access Journals (Sweden)
Meng-Shiuh Chang
2014-01-01
Full Text Available Agriculture is the least profitable industry in China. However, even with large financial subsidies from the government, farmers’ living standards have had no significant impact so far due to the historical, geographical, climatic factors. The study examines and quantifies the net economic and environmental benefits by utilizing biochar as a soil amendment in eleven counties in the Poyang Lake Eco-Economic Zone. A nonparametric kernel regression model is employed to estimate the relation between the scaled environmental and economic factors, which are determined as regression variables. In addition, the partial linear and single index regression models are used for comparison. In terms of evaluations of mean squared errors, the kernel estimator, exceeding the other estimators, is employed to forecast benefits of using biochar under various scenarios. The results indicate that biochar utilization can potentially increase farmers’ income if rice is planted and the net economic benefits can be achieved up to ¥114,900. The net economic benefits are higher when the pyrolysis plant is built in the south of Poyang Lake Eco-Economic Zone than when it is built in the north as the southern land is relatively barren, and biochar can save more costs on irrigation and fertilizer use.
A non-parametric framework for estimating threshold limit values
Directory of Open Access Journals (Sweden)
Ulm Kurt
2005-11-01
Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.
Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao
2013-01-01
Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Bayesian nonparametric method for prediction in EST analysis
Directory of Open Access Journals (Sweden)
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
A Censored Nonparametric Software Reliability Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Decompounding random sums: A nonparametric approach
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows
Molina, J. M.; Ramirez, J. A.; Raff, D. A.
2011-12-01
A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
portfolio optimization based on nonparametric estimation methods
Directory of Open Access Journals (Sweden)
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Variable selection in identification of a high dimensional nonlinear non-parametric system
Institute of Scientific and Technical Information of China (English)
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
Using a nonparametric PV model to forecast AC power output of PV plants
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
In this paper, a methodology using a nonparametric model is used to forecast AC power output of PV plants using as inputs several forecasts of meteorological variables from a Numerical Weather Prediction (NWP) model and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quantile Regression Forests as machine learning tool to forecast the AC power with a confidence interval. Real data from five PV plants was used to validate the methodology, an...
A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator
Energy Technology Data Exchange (ETDEWEB)
Huang, Shengzhi; Huang, Qiang; Leng, Guoyong; Chang, Jianxia
2016-06-01
This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar to Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.
Nonparametric estimation in an "illness-death" model when all transition times are interval censored
DEFF Research Database (Denmark)
Frydman, Halina; Gerds, Thomas; Grøn, Randi
2013-01-01
We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states {0,1,2} from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times ...
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Non-Parametric Estimation of Correlation Functions
DEFF Research Database (Denmark)
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Directory of Open Access Journals (Sweden)
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
Institute of Scientific and Technical Information of China (English)
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Institute of Scientific and Technical Information of China (English)
Guijun YANG; Lu LIN; Runchu ZHANG
2007-01-01
Quasi-regression, motivated by the problems arising in the computer experiments, focuses mainly on speeding up evaluation. However, its theoretical properties are unexplored systemically. This paper shows that quasi-regression is unbiased, strong convergent and asymptotic normal for parameter estimations but it is biased for the fitting of curve. Furthermore, a new method called unbiased quasi-regression is proposed. In addition to retaining the above asymptotic behaviors of parameter estimations, unbiased quasi-regression is unbiased for the fitting of curve.
Weisberg, Sanford
2005-01-01
Master linear regression techniques with a new edition of a classic text Reviews of the Second Edition: ""I found it enjoyable reading and so full of interesting material that even the well-informed reader will probably find something new . . . a necessity for all of those who do linear regression."" -Technometrics, February 1987 ""Overall, I feel that the book is a valuable addition to the now considerable list of texts on applied linear regression. It should be a strong contender as the leading text for a first serious course in regression analysis."" -American Scientist, May-June 1987
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Nick, Todd G; Campbell, Kathleen M
2007-01-01
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees
Directory of Open Access Journals (Sweden)
Savic Vladimir
2010-01-01
Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.
Directory of Open Access Journals (Sweden)
Mustafa Koroglu
2016-02-01
Full Text Available This paper considers a functional-coefficient spatial Durbin model with nonparametric spatial weights. Applying the series approximation method, we estimate the unknown functional coefficients and spatial weighting functions via a nonparametric two-stage least squares (or 2SLS estimation method. To further improve estimation accuracy, we also construct a second-step estimator of the unknown functional coefficients by a local linear regression approach. Some Monte Carlo simulation results are reported to assess the finite sample performance of our proposed estimators. We then apply the proposed model to re-examine national economic growth by augmenting the conventional Solow economic growth convergence model with unknown spatial interactive structures of the national economy, as well as country-specific Solow parameters, where the spatial weighting functions and Solow parameters are allowed to be a function of geographical distance and the countries’ openness to trade, respectively.
非参数判别模型%Nonparametric discriminant model
Institute of Scientific and Technical Information of China (English)
谢斌锋; 梁飞豹
2011-01-01
提出了一类新的判别分析方法,主要思想是将非参数回归模型推广到判别分析中,形成相应的非参数判别模型.通过实例与传统判别法相比较,表明非参数判别法具有更广泛的适用性和较高的回代正确率.%In this paper, the author puts forth a new class of discriminant method, which the main idea is applied non- parametric regression model to discriminant analysis and forms the corresponding nonparametric discriminant model. Compared with the traditional discriminant methods by citing an example, the nonparametric discriminant method has more comprehensive adaptability and higher correct rate of back subsitution.
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Nonparametric estimation of employee stock options
Institute of Scientific and Technical Information of China (English)
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
On Parametric (and Non-Parametric Variation
Directory of Open Access Journals (Sweden)
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Energy Technology Data Exchange (ETDEWEB)
Gerber, Samuel [Univ. of Utah, Salt Lake City, UT (United States); Rubel, Oliver [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bremer, Peer -Timo [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Whitaker, Ross T. [Univ. of Utah, Salt Lake City, UT (United States)
2012-01-19
This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.
Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach
Boiko, Igor
2013-01-01
The relay feedback test (RFT) has become a popular and efficient tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text. Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
Change-point estimation for censored regression model
Institute of Scientific and Technical Information of China (English)
Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO
2007-01-01
In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Directory of Open Access Journals (Sweden)
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times.
Xu, Yanxun; Müller, Peter; Wahed, Abdus S; Thall, Peter F
2016-01-01
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
Directory of Open Access Journals (Sweden)
Matthias Schmid
Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Directory of Open Access Journals (Sweden)
Ferger Dietmar
2009-09-01
Full Text Available Abstract Background Epidemiological and clinical studies, often including anthropometric measures, have established obesity as a major risk factor for the development of type 2 diabetes. Appropriate cut-off values for anthropometric parameters are necessary for prediction or decision purposes. The cut-off corresponding to the Youden-Index is often applied in epidemiology and biomedical literature for dichotomizing a continuous risk indicator. Methods Using data from a representative large multistage longitudinal epidemiological study in a primary care setting in Germany, this paper explores a novel approach for estimating optimal cut-offs of anthropomorphic parameters for predicting type 2 diabetes based on a discontinuity of a regression function in a nonparametric regression framework. Results The resulting cut-off corresponded to values obtained by the Youden Index (maximum of the sum of sensitivity and specificity, minus one, often considered the optimal cut-off in epidemiological and biomedical research. The nonparametric regression based estimator was compared to results obtained by the established methods of the Receiver Operating Characteristic plot in various simulation scenarios and based on bias and root mean square error, yielded excellent finite sample properties. Conclusion It is thus recommended that this nonparametric regression approach be considered as valuable alternative when a continuous indicator has to be dichotomized at the Youden Index for prediction or decision purposes.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Nonparametric Detection of Geometric Structures Over Networks
Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent
2017-10-01
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
Totton, Sarah C; Farrar, Ashley M; Wilkins, Wendy; Bucher, Oliver; Waddell, Lisa A; Wilhelm, Barbara J; McEwen, Scott A; Rajić, Andrijana
2012-10-01
Eating inappropriately prepared poultry meat is a major cause of foodborne salmonellosis. Our objectives were to determine the efficacy of feed and water additives (other than competitive exclusion and antimicrobials) on reducing Salmonella prevalence or concentration in broiler chickens using systematic review-meta-analysis and to explore sources of heterogeneity found in the meta-analysis through meta-regression. Six electronic databases were searched (Current Contents (1999-2009), Agricola (1924-2009), MEDLINE (1860-2009), Scopus (1960-2009), Centre for Agricultural Bioscience (CAB) (1913-2009), and CAB Global Health (1971-2009)), five topic experts were contacted, and the bibliographies of review articles and a topic-relevant textbook were manually searched to identify all relevant research. Study inclusion criteria comprised: English-language primary research investigating the effects of feed and water additives on the Salmonella prevalence or concentration in broiler chickens. Data extraction and study methodological assessment were conducted by two reviewers independently using pretested forms. Seventy challenge studies (n=910 unique treatment-control comparisons), seven controlled studies (n=154), and one quasi-experiment (n=1) met the inclusion criteria. Compared to an assumed control group prevalence of 44 of 1000 broilers, random-effects meta-analysis indicated that the Salmonella cecal colonization in groups with prebiotics (fructooligosaccharide, lactose, whey, dried milk, lactulose, lactosucrose, sucrose, maltose, mannanoligosaccharide) added to feed or water was 15 out of 1000 broilers; with lactose added to feed or water it was 10 out of 1000 broilers; with experimental chlorate product (ECP) added to feed or water it was 21 out of 1000. For ECP the concentration of Salmonella in the ceca was decreased by 0.61 log(10)cfu/g in the treated group compared to the control group. Significant heterogeneity (Cochran's Q-statistic p≤0.10) was observed
Combining regression trees and radial basis function networks.
Orr, M; Hallam, J; Takezawa, K; Murra, A; Ninomiya, S; Oide, M; Leonard, T
2000-12-01
We describe a method for non-parametric regression which combines regression trees with radial basis function networks. The method is similar to that of Kubat, who was first to suggest such a combination, but has some significant improvements. We demonstrate the features of the new method, compare its performance with other methods on DELVE data sets and apply it to a real world problem involving the classification of soybean plants from digital images.
Predicting students’ grades using fuzzy non-parametric regression method and ReliefF-based algorithm
Directory of Open Access Journals (Sweden)
Javad Ghasemian
Full Text Available In this paper we introduce two new approaches to predict the grades that university students will acquire in the final exam of a course and improve the obtained result on some features extracted from logged data in an educational web-based system. First w ...
Modified Nonparametric Kernel Estimates of a Regression Function and their Consistencies with Rates.
1985-04-01
estimates. In each case the speed of convergence is examined. An explicit bound for the mean square error, lacking to date in the literature for the...suP cBIg (x)-g(x)Il - O(max{nS,(nn) "I/ 21 and - -1/2suPx B lg(x)-g(x)l O(max{nS(nn)’ 1) in prob. To deduce the uniform weak consistency of r and r...Multivariate Analysis 515 Thftckeray Hall University ofPittsburgh._Pgh._PA__15260______________ It. CONTROLLING OFFICE NAME AND ADDRESS ta. REPORT DATE Air
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs. One of the major factors in transaction costs theory is information. Firm networks can catalyse the interpersonal information exchange and hence, increase the access to n...
Institute of Scientific and Technical Information of China (English)
聂志强; 欧艳秋; 庄建; 曲艳吉; 麦劲壮; 陈寄梅; 刘小清
2016-01-01
病例对照研究常采用条件或非条件logistic分析,生存资料分析常采用Cox比例模型,但多数文献仅纳入主效应模型,然而广义线性模型不同于一般线性模型,其交互作用分为相乘交互与相加交互作用,前者只有统计学意义而后者更符合生物学意义.笔者以SAS 9.4软件编写宏,在计算logistic与Cox相乘交互项同时计算交互对比度、归因比、交互作用指数指标及利用Wald、Delta、PL(profile likelihood)3种方法的可信区间评价相加交互作用,便于临床流行病学与遗传学大数据分析相乘相加交互作用时参考.%Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study,but Cox proportional hazard model is often used in survival data analysis.Most literature only refer to main effect model,however,generalized linear model differs from general linear model,and the interaction was composed of multiplicative interaction and additive interaction.The former is only statistical significant,but the latter has biological significance.In this paper,macros was written by using SAS 9.4 and the contrast ratio,attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions,and the confidence intervals of Wald,delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Nonparametric Bayesian drift estimation for multidimensional stochastic differential equations
Gugushvili, S.; Spreij, P.
2014-01-01
We consider nonparametric Bayesian estimation of the drift coefficient of a multidimensional stochastic differential equation from discrete-time observations on the solution of this equation. Under suitable regularity conditions, we establish posterior consistency in this context.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of
A non-parametric approach to investigating fish population dynamics
National Research Council Canada - National Science Library
Cook, R.M; Fryer, R.J
2001-01-01
.... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Energy Technology Data Exchange (ETDEWEB)
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
PV power forecast using a nonparametric PV model
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
Forecasting the AC power output of a PV plant accurately is important both for plant owners and electric system operators. Two main categories of PV modeling are available: the parametric and the nonparametric. In this paper, a methodology using a nonparametric PV model is proposed, using as inputs several forecasts of meteorological variables from a Numerical Weather Forecast model, and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quant...
Li, Ming; Gardiner, Joseph C; Breslau, Naomi; Anthony, James C; Lu, Qing
2014-07-01
Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2008-01-01
Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.
A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems
Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2017-06-01
We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.
Methodology in robust and nonparametric statistics
Jurecková, Jana; Picek, Jan
2012-01-01
Introduction and SynopsisIntroductionSynopsisPreliminariesIntroductionInference in Linear ModelsRobustness ConceptsRobust and Minimax Estimation of LocationClippings from Probability and Asymptotic TheoryProblemsRobust Estimation of Location and RegressionIntroductionM-EstimatorsL-EstimatorsR-EstimatorsMinimum Distance and Pitman EstimatorsDifferentiable Statistical FunctionsProblemsAsymptotic Representations for L-Estimators
Mayr, Andreas; Hothorn, Torsten; Fenske, Nora
2012-01-25
The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.
Institute of Scientific and Technical Information of China (English)
LU; Zudi
2001-01-01
［1］Engle, R. F., Granger, C. W. J., Rice, J. et al., Semiparametric estimates of the relation between weather and electricity sales, Journal of the American Statistical Association, 1986, 81: 310.［2］Heckman, N. E., Spline smoothing in partly linear models, Journal of the Royal Statistical Society, Ser. B, 1986, 48: 244.［3］Rice, J., Convergence rates for partially splined models, Statistics & Probability Letters, 1986, 4: 203.［4］Chen, H., Convergence rates for parametric components in a partly linear model, Annals of Statistics, 1988, 16: 136.［5］Robinson, P. M., Root-n-consistent semiparametric regression, Econometrica, 1988, 56: 931.［6］Speckman, P., Kernel smoothing in partial linear models, Journal of the Royal Statistical Society, Ser. B, 1988, 50: 413.［7］Cuzick, J., Semiparametric additive regression, Journal of the Royal Statistical Society, Ser. B, 1992, 54: 831.［8］Cuzick, J., Efficient estimates in semiparametric additive regression models with unknown error distribution, Annals of Statistics, 1992, 20: 1129.［9］Chen, H., Shiau, J. H., A two-stage spline smoothing method for partially linear models, Journal of Statistical Planning & Inference, 1991, 27: 187.［10］Chen, H., Shiau, J. H., Data-driven efficient estimators for a partially linear model, Annals of Statistics, 1994, 22: 211.［11］Schick, A., Root-n consistent estimation in partly linear regression models, Statistics & Probability Letters, 1996, 28: 353.［12］Hamilton, S. A., Truong, Y. K., Local linear estimation in partly linear model, Journal of Multivariate Analysis, 1997, 60: 1.［13］Mills, T. C., The Econometric Modeling of Financial Time Series, Cambridge: Cambridge University Press, 1993, 137.［14］Engle, R. F., Autoregressive conditional heteroscedasticity with estimates of United Kingdom inflation, Econometrica, 1982, 50: 987.［15］Bera, A. K., Higgins, M. L., A survey of ARCH models: properties of estimation and testing, Journal of Economic
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.
Mircioiu, Constantin; Atkinson, Jeffrey
2017-05-10
A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.
Energy Technology Data Exchange (ETDEWEB)
Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)
1990-11-01
A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.
Directory of Open Access Journals (Sweden)
Liang SY
2016-12-01
Full Text Available Shuyan Liang,* Jun Hu,* Yuanyuan Xie, Qing Zhou, Yanhong Zhu, Xiangliang Yang National Engineering Research Center for Nanomedicine, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, People’s Republic of China *These authors contributed equally to this work Abstract: Cancer immunotherapy based on nanodelivery systems has shown potential for treatment of various malignancies, owing to the benefits of tumor targeting of nanoparticles. However, induction of a potent T-cell immune response against tumors still remains a challenge. In this study, polyethylenimine-modified carboxyl-styrene/acrylamide (PS copolymer nanospheres were developed as a delivery system of unmethylated cytosine-phosphate-guanine (CpG oligodeoxynucleotides and transforming growth factor-beta (TGF-β receptor I inhibitors for cancer immunotherapy. TGF-β receptor I inhibitors (LY2157299, LY were encapsulated to the PS via hydrophobic interaction, while CpG oligodeoxynucleotides were loaded onto the PS through electrostatic interaction. Compared to the control group, tumor inhibition in the PS-LY/CpG group was up to 99.7% without noticeable toxicity. The tumor regression may be attributed to T-cell activation and amplification in mouse models. The results highlight the additive effect of CpG and TGF-β receptor I inhibitors co-delivered in cancer immunotherapy. Keywords: CpG, TGF-β receptor I inhibitor, Pst-AAm copolymer nanosphere, immunotherapy
Directory of Open Access Journals (Sweden)
M. Ahmadlou
2015-12-01
Full Text Available Land use change (LUC models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS, and a global parametric model called artificial neural network (ANN to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM and 2010 (ETM+ were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.
2015-12-01
Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
ZhuZhongyi; WeiBocheng
1999-01-01
In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.
Nonparametric estimation of a convex bathtub-shaped hazard function.
Jankowski, Hanna K; Wellner, Jon A
2009-11-01
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n(2/5) at points x(0) where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Directory of Open Access Journals (Sweden)
Kühnast, Corinna
2008-04-01
Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
Determining the Mass of Kepler-78b with Nonparametric Gaussian Process Estimation
Grunblatt, Samuel Kai; Howard, Andrew; Haywood, Raphaëlle
2016-01-01
Kepler-78b is a transiting planet that is 1.2 times the radius of Earth and orbits a young, active K dwarf every 8 hr. The mass of Kepler-78b has been independently reported by two teams based on radial velocity (RV) measurements using the HIRES and HARPS-N spectrographs. Due to the active nature of the host star, a stellar activity model is required to distinguish and isolate the planetary signal in RV data. Whereas previous studies tested parametric stellar activity models, we modeled this system using nonparametric Gaussian process (GP) regression. We produced a GP regression of relevant Kepler photometry. We then use the posterior parameter distribution for our photometric fit as a prior for our simultaneous GP + Keplerian orbit models of the RV data sets. We tested three simple kernel functions for our GP regressions. Based on a Bayesian likelihood analysis, we selected a quasi-periodic kernel model with GP hyperparameters coupled between the two RV data sets, giving a Doppler amplitude of 1.86 ± 0.25 m s-1 and supporting our belief that the correlated noise we are modeling is astrophysical. The corresponding mass of 1.87-0.26+0.27 ME is consistent with that measured in previous studies, and more robust due to our nonparametric signal estimation. Based on our mass and the radius measurement from transit photometry, Kepler-78b has a bulk density of 6.0-1.4+1.9 g cm-3. We estimate that Kepler-78b is 32% ± 26% iron using a two-component rock-iron model. This is consistent with an Earth-like composition, with uncertainty spanning Moon-like to Mercury-like compositions.
Carroll, Raymond
2009-04-23
We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.
Local Linear Regression for Data with AR Errors
Institute of Scientific and Technical Information of China (English)
Runze Li; Yan Li
2009-01-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques.We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one.From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders
DEFF Research Database (Denmark)
Nielsen, Morten Ørregaard
2009-01-01
In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....
A non-parametric model for the cosmic velocity field
Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S
1999-01-01
We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.
Influence of test and person characteristics on nonparametric appropriateness measurement
Meijer, Rob R.; Molenaar, Ivo W.; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement
Meijer, Rob R; Molenaar, Ivo W; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects
Qian, Minghui; Hu, Ridong; Chen, Jianwei
2016-01-01
Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Non-parametric Bayesian inference for inhomogeneous Markov point processes
DEFF Research Database (Denmark)
Berthelsen, Kasper Klitgaard; Møller, Jesper
With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...
Investigating the cultural patterns of corruption: A nonparametric analysis
Halkos, George; Tzeremes, Nickolaos
2011-01-01
By using a sample of 77 countries our analysis applies several nonparametric techniques in order to reveal the link between national culture and corruption. Based on Hofstede’s cultural dimensions and the corruption perception index, the results reveal that countries with higher levels of corruption tend to have higher power distance and collectivism values in their society.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
Non-parametric transformation for data correlation and integration: From theory to practice
Energy Technology Data Exchange (ETDEWEB)
Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)
1997-08-01
The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.
Identification and well-posedness in a class of nonparametric problems
Zinde-Walsh, Victoria
2010-01-01
This is a companion note to Zinde-Walsh (2010), arXiv:1009.4217v1[MATH.ST], to clarify and extend results on identification in a number of problems that lead to a system of convolution equations. Examples include identification of the distribution of mismeasured variables, of a nonparametric regression function under Berkson type measurement error, some nonparametric panel data models, etc. The reason that identification in different problems can be considered in one approach is that they lead to the same system of convolution equations; moreover the solution can be given under more general assumptions than those usually considered, by examining these equations in spaces of generalized functions. An important issue that did not receive sufficient attention is that of well-posedness. This note gives conditions under which well-posedness obtains, an example that demonstrates that when well-posedness does not hold functions that are far apart can give rise to observable arbitrarily close functions and discusses ...
Focused information criterion and model averaging based on weighted composite quantile regression
Xu, Ganggang
2013-08-13
We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non-parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR-estimator of a focused parameter is a non-linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non-normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure. © 2013 Board of the Foundation of the Scandinavian Journal of Statistics..
Functional Regression for Quasar Spectra
Ciollaro, Mattia; Freeman, Peter; Genovese, Christopher; Lei, Jing; O'Connell, Ross; Wasserman, Larry
2014-01-01
The Lyman-alpha forest is a portion of the observed light spectrum of distant galactic nuclei which allows us to probe remote regions of the Universe that are otherwise inaccessible. The observed Lyman-alpha forest of a quasar light spectrum can be modeled as a noisy realization of a smooth curve that is affected by a `damping effect' which occurs whenever the light emitted by the quasar travels through regions of the Universe with higher matter concentration. To decode the information conveyed by the Lyman-alpha forest about the matter distribution, we must be able to separate the smooth `continuum' from the noise and the contribution of the damping effect in the quasar light spectra. To predict the continuum in the Lyman-alpha forest, we use a nonparametric functional regression model in which both the response and the predictor variable (the smooth part of the damping-free portion of the spectrum) are function-valued random variables. We demonstrate that the proposed method accurately predicts the unobserv...
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Directory of Open Access Journals (Sweden)
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Directory of Open Access Journals (Sweden)
Akhtar R. Siddique
2000-03-01
Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.
Regression analysis by example
National Research Council Canada - National Science Library
Chatterjee, Samprit; Hadi, Ali S
2012-01-01
.... The emphasis continues to be on exploratory data analysis rather than statistical theory. The coverage offers in-depth treatment of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression...
Comparison of Rank Analysis of Covariance and Nonparametric Randomized Blocks Analysis.
Porter, Andrew C.; McSweeney, Maryellen
The relative power of three possible experimental designs under the condition that data is to be analyzed by nonparametric techniques; the comparison of the power of each nonparametric technique to its parametric analogue; and the comparison of relative powers using nonparametric and parametric techniques are discussed. The three nonparametric…
Szabo, Zoltan
2010-01-01
The goal of this paper is to extend independent subspace analysis (ISA) to the case of (i) nonparametric, not strictly stationary source dynamics and (ii) unknown source component dimensions. We make use of functional autoregressive (fAR) processes to model the temporal evolution of the hidden sources. An extension of the ISA separation principle--which states that the ISA problem can be solved by traditional independent component analysis (ICA) and clustering of the ICA elements--is derived for the solution of the defined fAR independent process analysis task (fAR-IPA): applying fAR identification we reduce the problem to ISA. A local averaging approach, the Nadaraya-Watson kernel regression technique is adapted to obtain strongly consistent fAR estimation. We extend the Amari-index to different dimensional components and illustrate the efficiency of the fAR-IPA approach by numerical examples.
Crainiceanu, Ciprian M; Caffo, Brian S; Di, Chong-Zhi; Punjabi, Naresh M
2009-06-01
We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Factors associated with malnutrition among tribal children in India: a non-parametric approach.
Debnath, Avijit; Bhattacharjee, Nairita
2014-06-01
The purpose of this study is to identify the determinants of malnutrition among the tribal children in India. The investigation is based on secondary data compiled from the National Family Health Survey-3. We used a classification and regression tree model, a non-parametric approach, to address the objective. Our analysis shows that breastfeeding practice, economic status, antenatal care of mother and women's decision-making autonomy are negatively associated with malnutrition among tribal children. We identify maternal malnutrition and urban concentration of household as the two risk factors for child malnutrition. The identified associated factors may be used for designing and targeting preventive programmes for malnourished tribal children. © The Author [2014]. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Photo-z Estimation: An Example of Nonparametric Conditional Density Estimation under Selection Bias
Izbicki, Rafael; Freeman, Peter E
2016-01-01
Redshift is a key quantity for inferring cosmological model parameters. In photometric redshift estimation, cosmologists use the coarse data collected from the vast majority of galaxies to predict the redshift of individual galaxies. To properly quantify the uncertainty in the predictions, however, one needs to go beyond standard regression and instead estimate the full conditional density f(z|x) of a galaxy's redshift z given its photometric covariates x. The problem is further complicated by selection bias: usually only the rarest and brightest galaxies have known redshifts, and these galaxies have characteristics and measured covariates that do not necessarily match those of more numerous and dimmer galaxies of unknown redshift. Unfortunately, there is not much research on how to best estimate complex multivariate densities in such settings. Here we describe a general framework for properly constructing and assessing nonparametric conditional density estimators under selection bias, and for combining two o...
DEFF Research Database (Denmark)
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...
Nonparametric inference procedures for multistate life table analysis.
Dow, M M
1985-01-01
Recent generalizations of the classical single state life table procedures to the multistate case provide the means to analyze simultaneously the mobility and mortality experience of 1 or more cohorts. This paper examines fairly general nonparametric combinatorial matrix procedures, known as quadratic assignment, as an analysis technic of various transitional patterns commonly generated by cohorts over the life cycle course. To some degree, the output from a multistate life table analysis suggests inference procedures. In his discussion of multstate life table construction features, the author focuses on the matrix formulation of the problem. He then presents several examples of the proposed nonparametric procedures. Data for the mobility and life expectancies at birth matrices come from the 458 member Cayo Santiago rhesus monkey colony. The author's matrix combinatorial approach to hypotheses testing may prove to be a useful inferential strategy in several multidimensional demographic areas.
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...
Poverty and life cycle effects: A nonparametric analysis for Germany
Stich, Andreas
1996-01-01
Most empirical studies on poverty consider the extent of poverty either for the entire society or for separate groups like elderly people.However, these papers do not show what the situation looks like for persons of a certain age. In this paper poverty measures depending on age are derived using the joint density of income and age. The density is nonparametrically estimated by weighted Gaussian kernel density estimation. Applying the conditional density of income to several poverty measures ...
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
Directory of Open Access Journals (Sweden)
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
Nonparametric test for detecting change in distribution with panel data
Pommeret, Denys; Ghattas, Badih
2011-01-01
This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.
Fusion of Hard and Soft Information in Nonparametric Density Estimation
2015-06-10
estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample...Fusion of Hard and Soft Information in Nonparametric Density Estimation∗ Johannes O. Royset Roger J-B Wets Department of Operations Research...univariate density estimation in situations when the sample ( hard information) is supplemented by “soft” information about the random phenomenon. These
Nonparametric estimation for hazard rate monotonously decreasing system
Institute of Scientific and Technical Information of China (English)
Han Fengyan; Li Weisong
2005-01-01
Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.
Non-parametric versus parametric methods in environmental sciences
Directory of Open Access Journals (Sweden)
Muhammad Riaz
2016-01-01
Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.
Wrong Signs in Regression Coefficients
McGee, Holly
1999-01-01
When using parametric cost estimation, it is important to note the possibility of the regression coefficients having the wrong sign. A wrong sign is defined as a sign on the regression coefficient opposite to the researcher's intuition and experience. Some possible causes for the wrong sign discussed in this paper are a small range of x's, leverage points, missing variables, multicollinearity, and computational error. Additionally, techniques for determining the cause of the wrong sign are given.
Varying-coefficient functional linear regression
Wu, Yichao; Müller, Hans-Georg; 10.3150/09-BEJ231
2011-01-01
Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regression model. We study a varying-coefficient approach where the scalar covariates are modeled as additional arguments of the regression parameter function. This extension of the functional linear regression model is analogous to the extension of conventional linear regression models to varying-coefficient models and shares its advantages, such as increased flexibility; however, the details of this extension are more challenging in the functional case. Our methodology combines smoothing methods with regularization by truncation at a finite numb...
Multilevel covariance regression with correlated random effects in the mean and variance structure.
Quintero, Adrian; Lesaffre, Emmanuel
2017-09-01
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistical learning from a regression perspective
Berk, Richard A
2016-01-01
This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...
Cannon, Alex
2017-04-01
Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Unitary Response Regression Models
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by m...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
Gianola, Daniel; Wu, Xiao-Lin; Manfredi, Eduardo; Simianer, Henner
2010-10-01
A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some "baseline" family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a
Directory of Open Access Journals (Sweden)
SIAVASH KALBI
2014-05-01
Full Text Available Kalbi S, Fallah A, Hojjati SM. 2014. Using and comparing two nonparametric methods (CART and RF and SPOT-HRG satellite data to predictive tree diversity distribution. Nusantara Bioscience 6: 57-62. The prediction of spatial distributions of tree species by means of survey data has recently been used for conservation planning. Numerous methods have been developed for building species habitat suitability models. The present study was carried out to find the possible proper relationships between tree species diversity indices and SPOT-HRG reflectance values in Hyrcanian forests, North of Iran. Two different modeling techniques, Classification and Regression Trees (CART and Random Forest (RF, were fitted to the data in order to find the most successfully model. Simpson, Shannon diversity and the reciprocal of Simpson indices were used for estimating tree diversity. After collecting terrestrial information on trees in the 100 samples, the tree diversity indices were calculated in each plot. RF with determinate coefficient and RMSE from 56.3 to 63.9 and RMSE from 0.15 to 0.84 has better results than CART algorithms with determinate coefficient 42.3 to 63.3 and RMSE from 0.188 to 0.88. Overall the results showed that the SPOT-HRG satellite data and nonparametric regression could be useful for estimating tree diversity in Hyrcanian forests, North of Iran.
A Frisch-Newton Algorithm for Sparse Quantile Regression
Institute of Scientific and Technical Information of China (English)
Roger Koenker; Pin Ng
2005-01-01
Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems.In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker[28].The new algorithm substantially reduces the storage (memory) requirements and increases computational speed.The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models.
An adaptive regression method for infrared blind-pixel compensation
Chen, Suting; Meng, Hao; Pei, Tao; Zhang, Yanyan
2017-09-01
Blind pixel compensation is an ill-posed inverse problem of infrared imaging systems and image restoration. The performance of a blind pixel compensation algorithm depends on the accuracy of estimation for the underlying true infrared images. We propose an adaptive regression method (ARM) for blind pixel compensation that integrates the multi-scale framework with a regression model. A blind-pixel is restored by exploiting the intra-scale properties through the nonparametric regressive estimation and the inter-scale characteristics via parametric regression for continuous learning. Combining the respective strengths of a parametric model and a nonparametric model, ARM establishes a set of multi-scale blind-pixel compensation method to correct the non-uniformity based on key frame extraction. Therefore, it is essentially different from the traditional frameworks for blind pixel compensation which are based on filtering and interpolation. Experimental results on some challenging cases of blind compensation show that the proposed algorithm outperforms existing methods by a significant margin in both isolated blind restoration and clustered blind restoration.
Digital spectral analysis parametric, non-parametric and advanced methods
Castanié, Francis
2013-01-01
Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a
Nonparametric statistics a step-by-step approach
Corder, Gregory W
2014-01-01
"…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory. It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
Nonparametric statistical structuring of knowledge systems using binary feature matches
DEFF Research Database (Denmark)
Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue
2014-01-01
statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...
Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing
DEFF Research Database (Denmark)
Hald, Ditte Høvenhoff
The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...
Marmarelis, Vasilis Z; Shin, Dae C; Zhang, Yaping; Kautzky-Willer, Alexandra; Pacini, Giovanni; D'Argenio, David Z
2013-07-01
Modeling studies of the insulin-glucose relationship have mainly utilized parametric models, most notably the minimal model (MM) of glucose disappearance. This article presents results from the comparative analysis of the parametric MM and a nonparametric Laguerre based Volterra Model (LVM) applied to the analysis of insulin modified (IM) intravenous glucose tolerance test (IVGTT) data from a clinical study of gestational diabetes mellitus (GDM). An IM IVGTT study was performed 8 to 10 weeks postpartum in 125 women who were diagnosed with GDM during their pregnancy [population at risk of developing diabetes (PRD)] and in 39 control women with normal pregnancies (control subjects). The measured plasma glucose and insulin from the IM IVGTT in each group were analyzed via a population analysis approach to estimate the insulin sensitivity parameter of the parametric MM. In the nonparametric LVM analysis, the glucose and insulin data were used to calculate the first-order kernel, from which a diagnostic scalar index representing the integrated effect of insulin on glucose was derived. Both the parametric MM and nonparametric LVM describe the glucose concentration data in each group with good fidelity, with an improved measured versus predicted r² value for the LVM of 0.99 versus 0.97 for the MM analysis in the PRD. However, application of the respective diagnostic indices of the two methods does result in a different classification of 20% of the individuals in the PRD. It was found that the data based nonparametric LVM revealed additional insights about the manner in which infused insulin affects blood glucose concentration. © 2013 Diabetes Technology Society.
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA...... of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...
Using Mathematica to build Non-parametric Statistical Tables
Directory of Open Access Journals (Sweden)
Gloria Perez Sainz de Rozas
2003-01-01
Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.
1st Conference of the International Society for Nonparametric Statistics
Lahiri, S; Politis, Dimitris
2014-01-01
This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world. The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...
Non-parametric Morphologies of Mergers in the Illustris Simulation
Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G
2016-01-01
We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
Lee, Kyungbook; Song, Seok Goo
2016-10-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events (M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods
Directory of Open Access Journals (Sweden)
Pedro Carvalho
2016-03-01
Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.
Autistic epileptiform regression.
Canitano, Roberto; Zappella, Michele
2006-01-01
Autistic regression is a well known condition that occurs in one third of children with pervasive developmental disorders, who, after normal development in the first year of life, undergo a global regression during the second year that encompasses language, social skills and play. In a portion of these subjects, epileptiform abnormalities are present with or without seizures, resembling, in some respects, other epileptiform regressions of language and behaviour such as Landau-Kleffner syndrome. In these cases, for a more accurate definition of the clinical entity, the term autistic epileptifom regression has been suggested. As in other epileptic syndromes with regression, the relationships between EEG abnormalities, language and behaviour, in autism, are still unclear. We describe two cases of autistic epileptiform regression selected from a larger group of children with autistic spectrum disorders, with the aim of discussing the clinical features of the condition, the therapeutic approach and the outcome.
Scaled Sparse Linear Regression
Sun, Tingni
2011-01-01
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual squares and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs nearly nothing beyond the computation of a path of the sparse regression estimator for penalty levels above a threshold. For the scaled Lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the method yields simultaneously an estimator for the noise level and an estimated coefficient vector in the Lasso path satisfying certain oracle inequalities for the estimation of the noise level, prediction, and the estimation of regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic...
DEFF Research Database (Denmark)
Effraimidis, Georgios; Dahl, Christian Møller
In this paper, we develop a fully nonparametric approach for the estimation of the cumulative incidence function with Missing At Random right-censored competing risks data. We obtain results on the pointwise asymptotic normality as well as the uniform convergence rate of the proposed nonparametric...... estimator. A simulation study that serves two purposes is provided. First, it illustrates in details how to implement our proposed nonparametric estimator. Secondly, it facilitates a comparison of the nonparametric estimator to a parametric counterpart based on the estimator of Lu and Liang (2008...
Rolling Regressions with Stata
Kit Baum
2004-01-01
This talk will describe some work underway to add a "rolling regression" capability to Stata's suite of time series features. Although commands such as "statsby" permit analysis of non-overlapping subsamples in the time domain, they are not suited to the analysis of overlapping (e.g. "moving window") samples. Both moving-window and widening-window techniques are often used to judge the stability of time series regression relationships. We will present an implementation of a rolling regression...
Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A
2015-05-01
Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Using nonparametrics to specify a model to measure the value of travel time
DEFF Research Database (Denmark)
Fosgerau, Mogens
2007-01-01
Using a range of nonparametric methods, the paper examines the specification of a model to evaluate the willingness-to-pay (WTP) for travel time changes from binomial choice data from a simple time-cost trading experiment. The analysis favours a model with random WTP as the only source...... of randomness over a model with fixed WTP which is linear in time and cost and has an additive random error term. Results further indicate that the distribution of log WTP can be described as a sum of a linear index fixing the location of the log WTP distribution and an independent random variable representing...... unobserved heterogeneity. This formulation is useful for parametric modelling. The index indicates that the WTP varies systematically with income and other individual characteristics. The WTP varies also with the time difference presented in the experiment which is in contradiction of standard utility theory....
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.
Non-parametric PSF estimation from celestial transit solar images using blind deconvolution
González, Adriana; Delouille, Véronique; Jacques, Laurent
2016-01-01
Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF). Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting). The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.
Non-parametric PSF estimation from celestial transit solar images using blind deconvolution
Directory of Open Access Journals (Sweden)
González Adriana
2016-01-01
Full Text Available Context: Characterization of instrumental effects in astronomical imaging is important in order to extract accurate physical information from the observations. The measured image in a real optical instrument is usually represented by the convolution of an ideal image with a Point Spread Function (PSF. Additionally, the image acquisition process is also contaminated by other sources of noise (read-out, photon-counting. The problem of estimating both the PSF and a denoised image is called blind deconvolution and is ill-posed. Aims: We propose a blind deconvolution scheme that relies on image regularization. Contrarily to most methods presented in the literature, our method does not assume a parametric model of the PSF and can thus be applied to any telescope. Methods: Our scheme uses a wavelet analysis prior model on the image and weak assumptions on the PSF. We use observations from a celestial transit, where the occulting body can be assumed to be a black disk. These constraints allow us to retain meaningful solutions for the filter and the image, eliminating trivial, translated, and interchanged solutions. Under an additive Gaussian noise assumption, they also enforce noise canceling and avoid reconstruction artifacts by promoting the whiteness of the residual between the blurred observations and the cleaned data. Results: Our method is applied to synthetic and experimental data. The PSF is estimated for the SECCHI/EUVI instrument using the 2007 Lunar transit, and for SDO/AIA using the 2012 Venus transit. Results show that the proposed non-parametric blind deconvolution method is able to estimate the core of the PSF with a similar quality to parametric methods proposed in the literature. We also show that, if these parametric estimations are incorporated in the acquisition model, the resulting PSF outperforms both the parametric and non-parametric methods.
Bayesian Nonparametric Mixture Estimation for Time-Indexed Functional Data in R
Directory of Open Access Journals (Sweden)
Terrance D. Savitsky
2016-08-01
Full Text Available We present growfunctions for R that offers Bayesian nonparametric estimation models for analysis of dependent, noisy time series data indexed by a collection of domains. This data structure arises from combining periodically published government survey statistics, such as are reported in the Current Population Study (CPS. The CPS publishes monthly, by-state estimates of employment levels, where each state expresses a noisy time series. Published state-level estimates from the CPS are composed from household survey responses in a model-free manner and express high levels of volatility due to insufficient sample sizes. Existing software solutions borrow information over a modeled time-based dependence to extract a de-noised time series for each domain. These solutions, however, ignore the dependence among the domains that may be additionally leveraged to improve estimation efficiency. The growfunctions package offers two fully nonparametric mixture models that simultaneously estimate both a time and domain-indexed dependence structure for a collection of time series: (1 A Gaussian process (GP construction, which is parameterized through the covariance matrix, estimates a latent function for each domain. The covariance parameters of the latent functions are indexed by domain under a Dirichlet process prior that permits estimation of the dependence among functions across the domains: (2 An intrinsic Gaussian Markov random field prior construction provides an alternative to the GP that expresses different computation and estimation properties. In addition to performing denoised estimation of latent functions from published domain estimates, growfunctions allows estimation of collections of functions for observation units (e.g., households, rather than aggregated domains, by accounting for an informative sampling design under which the probabilities for inclusion of observation units are related to the response variable. growfunctions includes plot
DEFF Research Database (Denmark)
Bordacconi, Mats Joe; Larsen, Martin Vinæs
2014-01-01
Humans are fundamentally primed for making causal attributions based on correlations. This implies that researchers must be careful to present their results in a manner that inhibits unwarranted causal attribution. In this paper, we present the results of an experiment that suggests regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
A Non-Parametric Spatial Independence Test Using Symbolic Entropy
Directory of Open Access Journals (Sweden)
López Hernández, Fernando
2008-01-01
Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Analyzing multiple spike trains with nonparametric Granger causality.
Nedungadi, Aatira G; Rangarajan, Govindan; Jain, Neeraj; Ding, Mingzhou
2009-08-01
Simultaneous recordings of spike trains from multiple single neurons are becoming commonplace. Understanding the interaction patterns among these spike trains remains a key research area. A question of interest is the evaluation of information flow between neurons through the analysis of whether one spike train exerts causal influence on another. For continuous-valued time series data, Granger causality has proven an effective method for this purpose. However, the basis for Granger causality estimation is autoregressive data modeling, which is not directly applicable to spike trains. Various filtering options distort the properties of spike trains as point processes. Here we propose a new nonparametric approach to estimate Granger causality directly from the Fourier transforms of spike train data. We validate the method on synthetic spike trains generated by model networks of neurons with known connectivity patterns and then apply it to neurons simultaneously recorded from the thalamus and the primary somatosensory cortex of a squirrel monkey undergoing tactile stimulation.
Prior processes and their applications nonparametric Bayesian estimation
Phadia, Eswar G
2016-01-01
This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Nonparametric Estimation of Distributions in Random Effects Models
Hart, Jeffrey D.
2011-01-01
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.
Curve registration by nonparametric goodness-of-fit testing
Dalalyan, Arnak
2011-01-01
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the p-value of the chi squared test. We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 1. Some numerical experiments on synthetic datasets are reported as well.
Nonparametric forecasting of low-dimensional dynamical systems.
Berry, Tyrus; Giannakis, Dimitrios; Harlim, John
2015-03-01
This paper presents a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. The key idea is to represent the discrete shift maps on a smooth basis which can be obtained by the diffusion maps algorithm. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution to the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to quantify uncertainties (in fact, evolve the probability distribution) for nontrivial dynamical systems with equation-free modeling. We verify our approach on various examples, ranging from an inhomogeneous anisotropic stochastic differential equation on a torus, the chaotic Lorenz three-dimensional model, and the Niño-3.4 data set which is used as a proxy of the El Niño Southern Oscillation.
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.
Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z
2017-03-01
A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Revealing components of the galaxy population through nonparametric techniques
Bamford, Steven P; Nichol, Robert C; Miller, Christopher J; Wasserman, Larry; Genovese, Christopher R; Freeman, Peter E
2008-01-01
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
Multi-Directional Non-Parametric Analysis of Agricultural Efficiency
DEFF Research Database (Denmark)
Balezentis, Tomas
This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Nonparametric reconstruction of the Om diagnostic to test LCDM
Escamilla-Rivera, Celia
2015-01-01
Cosmic acceleration is usually related with the unknown dark energy, which equation of state, w(z), is constrained and numerically confronted with independent astrophysical data. In order to make a diagnostic of w(z), the introduction of a null test of dark energy can be done using a diagnostic function of redshift, Om. In this work we present a nonparametric reconstruction of this diagnostic using the so-called Loess-Simex factory to test the concordance model with the advantage that this approach offers an alternative way to relax the use of priors and find a possible 'w' that reliably describe the data with no previous knowledge of a cosmological model. Our results demonstrate that the method applied to the dynamical Om diagnostic finds a preference for a dark energy model with equation of state w =-2/3, which correspond to a static domain wall network.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
DEFF Research Database (Denmark)
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Hosmer, David W; Sturdivant, Rodney X
2013-01-01
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Weisberg, Sanford
2013-01-01
Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Equity and efficiency in private and public education: a nonparametric comparison
L. Cherchye; K. de Witte; E. Ooghe; I. Nicaise
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Non-parametric tests of productive efficiency with errors-in-variables
Kuosmanen, T.K.; Post, T.; Scholtes, S.
2007-01-01
We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans
Equity and efficiency in private and public education: a nonparametric comparison
Cherchye, L.; de Witte, K.; Ooghe, E.; Nicaise, I.
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Transductive Ordinal Regression
Seah, Chun-Wei; Ong, Yew-Soon
2011-01-01
Ordinal regression is commonly formulated as a multi-class problem with ordinal constraints. The challenge of designing accurate classifiers for ordinal regression generally increases with the number of classes involved, due to the large number of labeled patterns that are needed. The availability of ordinal class labels, however, are often costly to calibrate or difficult to obtain. Unlabeled patterns, on the other hand, often exist in much greater abundance and are freely available. To take benefits from the abundance of unlabeled patterns, we present a novel transductive learning paradigm for ordinal regression in this paper, namely Transductive Ordinal Regression (TOR). The key challenge of the present study lies in the precise estimation of both the ordinal class label of the unlabeled data and the decision functions of the ordinal classes, simultaneously. The core elements of the proposed TOR include an objective function that caters to several commonly used loss functions casted in transductive setting...
A Nonparametric Bayesian Approach to Seismic Hazard Modeling Using the ETAS Framework
Ross, G.
2015-12-01
The epidemic-type aftershock sequence (ETAS) model is one of the most popular tools for modeling seismicity and quantifying risk in earthquake-prone regions. Under the ETAS model, the occurrence times of earthquakes are treated as a self-exciting Poisson process where each earthquake briefly increases the probability of subsequent earthquakes occurring soon afterwards, which captures the fact that large mainshocks tend to produce long sequences of aftershocks. A triggering kernel controls the amount by which the probability increases based on the magnitude of each earthquake, and the rate at which it then decays over time. This triggering kernel is usually chosen heuristically, to match the parametric form of the modified Omori law for aftershock decay. However recent work has questioned whether this is an appropriate choice. Since the choice of kernel has a large impact on the predictions made by the ETAS model, avoiding misspecification is crucially important. We present a novel nonparametric version of ETAS which avoids making parametric assumptions, and instead learns the correct specification from the data itself. Our approach is based on the Dirichlet process, which is a modern class of Bayesian prior distribution which allows for efficient inference over an infinite dimensional space of functions. We show how our nonparametric ETAS model can be fit to data, and present results demonstrating that the fit is greatly improved compared to the standard parametric specification. Additionally, we explain how our model can be used to perform probabilistic declustering of earthquake catalogs, to classify earthquakes as being either aftershocks or mainshocks. and to learn the causal relations between pairs of earthquakes.
Forecasting turbulent modes with nonparametric diffusion models: Learning from noisy data
Berry, Tyrus; Harlim, John
2016-04-01
In this paper, we apply a recently developed nonparametric modeling approach, the "diffusion forecast", to predict the time-evolution of Fourier modes of turbulent dynamical systems. While the diffusion forecasting method assumes the availability of a noise-free training data set observing the full state space of the dynamics, in real applications we often have only partial observations which are corrupted by noise. To alleviate these practical issues, following the theory of embedology, the diffusion model is built using the delay-embedding coordinates of the data. We show that this delay embedding biases the geometry of the data in a way which extracts the most stable component of the dynamics and reduces the influence of independent additive observation noise. The resulting diffusion forecast model approximates the semigroup solutions of the generator of the underlying dynamics in the limit of large data and when the observation noise vanishes. As in any standard forecasting problem, the forecasting skill depends crucially on the accuracy of the initial conditions. We introduce a novel Bayesian method for filtering the discrete-time noisy observations which works with the diffusion forecast to determine the forecast initial densities. Numerically, we compare this nonparametric approach with standard stochastic parametric models on a wide-range of well-studied turbulent modes, including the Lorenz-96 model in weakly chaotic to fully turbulent regimes and the barotropic modes of a quasi-geostrophic model with baroclinic instabilities. We show that when the only available data is the low-dimensional set of noisy modes that are being modeled, the diffusion forecast is indeed competitive to the perfect model.
Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods
Mosaedi, Abolfazl; Kouhestani, Nasrin
2010-05-01
One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.
A Non-parametric Approach to the Overall Estimate of Cognitive Load Using NIRS Time Series.
Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Ishiguro, Hiroshi
2017-01-01
We present a non-parametric approach to prediction of the n-back n ∈ {1, 2} task as a proxy measure of mental workload using Near Infrared Spectroscopy (NIRS) data. In particular, we focus on measuring the mental workload through hemodynamic responses in the brain induced by these tasks, thereby realizing the potential that they can offer for their detection in real world scenarios (e.g., difficulty of a conversation). Our approach takes advantage of intrinsic linearity that is inherent in the components of the NIRS time series to adopt a one-step regression strategy. We demonstrate the correctness of our approach through its mathematical analysis. Furthermore, we study the performance of our model in an inter-subject setting in contrast with state-of-the-art techniques in the literature to show a significant improvement on prediction of these tasks (82.50 and 86.40% for female and male participants, respectively). Moreover, our empirical analysis suggest a gender difference effect on the performance of the classifiers (with male data exhibiting a higher non-linearity) along with the left-lateralized activation in both genders with higher specificity in females.
Directory of Open Access Journals (Sweden)
Weiß, Verena
2015-10-01
Full Text Available Introduction: For survival data the coefficient of determination cannot be used to describe how good a model fits to the data. Therefore, several measures of explained variation for survival data have been proposed in recent years.Methods: We analyse an existing measure of explained variation with regard to minimisation aspects and demonstrate that these are not fulfilled for the measure.Results: In analogy to the least squares method from linear regression analysis we develop a novel measure for categorical covariates which is based only on the Kaplan-Meier estimator. Hence, the novel measure is a completely nonparametric measure with an easy graphical interpretation. For the novel measure different weighting possibilities are available and a statistical test of significance can be performed. Eventually, we apply the novel measure and further measures of explained variation to a dataset comprising persons with a histopathological papillary thyroid carcinoma.Conclusion: We propose a novel measure of explained variation with a comprehensible derivation as well as a graphical interpretation, which may be used in further analyses with survival data.
Institute of Scientific and Technical Information of China (English)
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.
Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:
Stochastic search, optimization and regression with energy applications
Hannah, Lauren A.
Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression
DEFF Research Database (Denmark)
Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf
We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...... with a nonparametric function of a continuous covariate. This model is tailormade to address known data quality issues of administrative labour market data. Using a sample of 20m observations from Germany we estimate the determinants of labour market transitions and illustrate the role of considerable...
[Understanding logistic regression].
El Sanharawi, M; Naudet, F
2013-10-01
Logistic regression is one of the most common multivariate analysis models utilized in epidemiology. It allows the measurement of the association between the occurrence of an event (qualitative dependent variable) and factors susceptible to influence it (explicative variables). The choice of explicative variables that should be included in the logistic regression model is based on prior knowledge of the disease physiopathology and the statistical association between the variable and the event, as measured by the odds ratio. The main steps for the procedure, the conditions of application, and the essential tools for its interpretation are discussed concisely. We also discuss the importance of the choice of variables that must be included and retained in the regression model in order to avoid the omission of important confounding factors. Finally, by way of illustration, we provide an example from the literature, which should help the reader test his or her knowledge.
Constrained Sparse Galerkin Regression
Loiseau, Jean-Christophe
2016-01-01
In this work, we demonstrate the use of sparse regression techniques from machine learning to identify nonlinear low-order models of a fluid system purely from measurement data. In particular, we extend the sparse identification of nonlinear dynamics (SINDy) algorithm to enforce physical constraints in the regression, leading to energy conservation. The resulting models are closely related to Galerkin projection models, but the present method does not require the use of a full-order or high-fidelity Navier-Stokes solver to project onto basis modes. Instead, the most parsimonious nonlinear model is determined that is consistent with observed measurement data and satisfies necessary constraints. The constrained Galerkin regression algorithm is implemented on the fluid flow past a circular cylinder, demonstrating the ability to accurately construct models from data.
Hussey, Michael A; Koch, Gary G; Preisser, John S; Saville, Benjamin R
2016-01-01
Time-to-event or dichotomous outcomes in randomized clinical trials often have analyses using the Cox proportional hazards model or conditional logistic regression, respectively, to obtain covariate-adjusted log hazard (or odds) ratios. Nonparametric Randomization-Based Analysis of Covariance (NPANCOVA) can be applied to unadjusted log hazard (or odds) ratios estimated from a model containing treatment as the only explanatory variable. These adjusted estimates are stratified population-averaged treatment effects and only require a valid randomization to the two treatment groups and avoid key modeling assumptions (e.g., proportional hazards in the case of a Cox model) for the adjustment variables. The methodology has application in the regulatory environment where such assumptions cannot be verified a priori. Application of the methodology is illustrated through three examples on real data from two randomized trials.
Practical Session: Logistic Regression
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
DEFF Research Database (Denmark)
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
Verification of helical tomotherapy delivery using autoassociative kernel regression.
Seibert, Rebecca M; Ramsey, Chester R; Garvey, Dustin R; Hines, J Wesley; Robison, Ben H; Outten, Samuel S
2007-08-01
Quality assurance (QA) is a topic of major concern in the field of intensity modulated radiation therapy (IMRT). The standard of practice for IMRT is to perform QA testing for individual patients to verify that the dose distribution will be delivered to the patient. The purpose of this study was to develop a new technique that could eventually be used to automatically evaluate helical tomotherapy treatments during delivery using exit detector data. This technique uses an autoassociative kernel regression (AAKR) model to detect errors in tomotherapy delivery. AAKR is a novel nonparametric model that is known to predict a group of correct sensor values when supplied a group of sensor values that is usually corrupted or contains faults such as machine failure. This modeling scheme is especially suited for the problem of monitoring the fluence values found in the exit detector data because it is able to learn the complex detector data relationships. This scheme still applies when detector data are summed over many frames with a low temporal resolution and a variable beam attenuation resulting from patient movement. Delivery sequences from three archived patients (prostate, lung, and head and neck) were used in this study. Each delivery sequence was modified by reducing the opening time for random individual multileaf collimator (MLC) leaves by random amounts. The errof and error-free treatments were delivered with different phantoms in the path of the beam. Multiple autoassociative kernel regression (AAKR) models were developed and tested by the investigators using combinations of the stored exit detector data sets from each delivery. The models proved robust and were able to predict the correct or error-free values for a projection, which had a single MLC leaf decrease its opening time by less than 10 msec. The model also was able to determine machine output errors. The average uncertainty value for the unfaulted projections ranged from 0.4% to 1.8% of the detector
Local Linear Regression on Manifolds and its Geometric Interpretation
Cheng, Ming-Yen
2012-01-01
We study nonparametric regression with high-dimensional data, when the predictors lie on an unknown, lower-dimensional manifold. In this context, recently \\cite{aswani_bickel:2011} suggested performing the conventional local linear regression (LLR) in the ambient space and regularizing the estimation problem using information obtained from learning the manifold locally. By contrast, our approach is to reduce the dimensionality first and then construct the LLR directly on a tangent plane approximation to the manifold. Under mild conditions, asymptotic expressions for the conditional mean squared error of the proposed estimator are derived for both the interior and the boundary cases. One implication of these results is that the optimal convergence rate depends only on the intrinsic dimension $d$ of the manifold, but not on the ambient space dimension $p$. Another implication is that the estimator is design adaptive and automatically adapts to the boundary of the unknown manifold. The bias and variance expressi...
Inverse probability weighted Cox regression for doubly truncated data.
Mandel, Micha; de Uña-Álvarez, Jacobo; Simon, David K; Betensky, Rebecca A
2017-09-08
Doubly truncated data arise when event times are observed only if they fall within subject-specific, possibly random, intervals. While non-parametric methods for survivor function estimation using doubly truncated data have been intensively studied, only a few methods for fitting regression models have been suggested, and only for a limited number of covariates. In this article, we present a method to fit the Cox regression model to doubly truncated data with multiple discrete and continuous covariates, and describe how to implement it using existing software. The approach is used to study the association between candidate single nucleotide polymorphisms and age of onset of Parkinson's disease. © 2017, The International Biometric Society.
Ritz, Christian; Parmigiani, Giovanni
2009-01-01
R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Adaptive metric kernel regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
2000-01-01
regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Software Regression Verification
2013-12-11
of recursive procedures. Acta Informatica , 45(6):403 – 439, 2008. [GS11] Benny Godlin and Ofer Strichman. Regression verifica- tion. Technical Report...functions. Therefore, we need to rede - fine m-term. – Mutual termination. If either function f or function f ′ (or both) is non- deterministic, then their
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Radial basis function regression methods for predicting quantitative traits using SNP markers.
Long, Nanye; Gianola, Daniel; Rosa, Guilherme J M; Weigel, Kent A; Kranis, Andreas; González-Recio, Oscar
2010-06-01
A challenge when predicting total genetic values for complex quantitative traits is that an unknown number of quantitative trait loci may affect phenotypes via cryptic interactions. If markers are available, assuming that their effects on phenotypes are additive may lead to poor predictive ability. Non-parametric radial basis function (RBF) regression, which does not assume a particular form of the genotype-phenotype relationship, was investigated here by simulation and analysis of body weight and food conversion rate data in broilers. The simulation included a toy example in which an arbitrary non-linear genotype-phenotype relationship was assumed, and five different scenarios representing different broad sense heritability levels (0.1, 0.25, 0.5, 0.75 and 0.9) were created. In addition, a whole genome simulation was carried out, in which three different gene action modes (pure additive, additive+dominance and pure epistasis) were considered. In all analyses, a training set was used to fit the model and a testing set was used to evaluate predictive performance. The latter was measured by correlation and predictive mean-squared error (PMSE) on the testing data. For comparison, a linear additive model known as Bayes A was used as benchmark. Two RBF models with single nucleotide polymorphism (SNP)-specific (RBF I) and common (RBF II) weights were examined. Results indicated that, in the presence of complex genotype-phenotype relationships (i.e. non-linearity and non-additivity), RBF outperformed Bayes A in predicting total genetic values using SNP markers. Extension of Bayes A to include all additive, dominance and epistatic effects could improve its prediction accuracy. RBF I was generally better than RBF II, and was able to identify relevant SNPs in the toy example.
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
The Utility of Nonparametric Transformations for Imputation of Survey Data
Directory of Open Access Journals (Sweden)
Robbins Michael W.
2014-12-01
Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.
Nonparametric identification of structural modifications in Laplace domain
Suwała, G.; Jankowski, Ł.
2017-02-01
This paper proposes and experimentally verifies a Laplace-domain method for identification of structural modifications, which (1) unlike time-domain formulations, allows the identification to be focused on these parts of the frequency spectrum that have a high signal-to-noise ratio, and (2) unlike frequency-domain formulations, decreases the influence of numerical artifacts related to the particular choice of the FFT exponential window decay. In comparison to the time-domain approach proposed earlier, advantages of the proposed method are smaller computational cost and higher accuracy, which leads to reliable performance in more difficult identification cases. Analytical formulas for the first- and second-order sensitivity analysis are derived. The approach is based on a reduced nonparametric model, which has the form of a set of selected structural impulse responses. Such a model can be collected purely experimentally, which obviates the need for design and laborious updating of a parametric model, such as a finite element model. The approach is verified experimentally using a 26-node lab 3D truss structure and 30 identification cases of a single mass modification or two concurrent mass modifications.
A New Non-Parametric Approach to Galaxy Morphological Classification
Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero
2003-01-01
We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...
Adaptive Neural Network Nonparametric Identifier With Normalized Learning Laws.
Chairez, Isaac
2016-04-05
This paper addresses the design of a normalized convergent learning law for neural networks (NNs) with continuous dynamics. The NN is used here to obtain a nonparametric model for uncertain systems described by a set of ordinary differential equations. The source of uncertainties is the presence of some external perturbations and poor knowledge of the nonlinear function describing the system dynamics. A new adaptive algorithm based on normalized algorithms was used to adjust the weights of the NN. The adaptive algorithm was derived by means of a nonstandard logarithmic Lyapunov function (LLF). Two identifiers were designed using two variations of LLFs leading to a normalized learning law for the first identifier and a variable gain normalized learning law. In the case of the second identifier, the inclusion of normalized learning laws yields to reduce the size of the convergence region obtained as solution of the practical stability analysis. On the other hand, the velocity of convergence for the learning laws depends on the norm of errors in inverse form. This fact avoids the peaking transient behavior in the time evolution of weights that accelerates the convergence of identification error. A numerical example demonstrates the improvements achieved by the algorithm introduced in this paper compared with classical schemes with no-normalized continuous learning methods. A comparison of the identification performance achieved by the no-normalized identifier and the ones developed in this paper shows the benefits of the learning law proposed in this paper.
Nonparametric estimation of quantum states, processes and measurements
Lougovski, Pavel; Bennink, Ryan
Quantum state, process, and measurement estimation methods traditionally use parametric models, in which the number and role of relevant parameters is assumed to be known. When such an assumption cannot be justified, a common approach in many disciplines is to fit the experimental data to multiple models with different sets of parameters and utilize an information criterion to select the best fitting model. However, it is not always possible to assume a model with a finite (countable) number of parameters. This typically happens when there are unobserved variables that stem from hidden correlations that can only be unveiled after collecting experimental data. How does one perform quantum characterization in this situation? We present a novel nonparametric method of experimental quantum system characterization based on the Dirichlet Process (DP) that addresses this problem. Using DP as a prior in conjunction with Bayesian estimation methods allows us to increase model complexity (number of parameters) adaptively as the number of experimental observations grows. We illustrate our approach for the one-qubit case and show how a probability density function for an unknown quantum process can be estimated.
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Non-parametric and least squares Langley plot methods
Directory of Open Access Journals (Sweden)
P. W. Kiedron
2015-04-01
Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.
Logistic regression: a brief primer.
Stoltzfus, Jill C
2011-10-01
Regression techniques are versatile in their application to medical research because they can measure associations, predict outcomes, and control for confounding variable effects. As one such technique, logistic regression is an efficient and powerful way to analyze the effect of a group of independent variables on a binary outcome by quantifying each independent variable's unique contribution. Using components of linear regression reflected in the logit scale, logistic regression iteratively identifies the strongest linear combination of variables with the greatest probability of detecting the observed outcome. Important considerations when conducting logistic regression include selecting independent variables, ensuring that relevant assumptions are met, and choosing an appropriate model building strategy. For independent variable selection, one should be guided by such factors as accepted theory, previous empirical investigations, clinical considerations, and univariate statistical analyses, with acknowledgement of potential confounding variables that should be accounted for. Basic assumptions that must be met for logistic regression include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable to avoid an overfit model, with commonly recommended minimum "rules of thumb" ranging from 10 to 20 events per covariate. Regarding model building strategies, the three general types are direct/standard, sequential/hierarchical, and stepwise/statistical, with each having a different emphasis and purpose. Before reaching definitive conclusions from the results of any of these methods, one should formally quantify the model's internal validity (i.e., replicability within the same data set) and external validity (i.e., generalizability beyond the current sample). The resulting logistic regression model
Monotone Regression and Correction for Order Relation Deviations in Indicator Kriging
Institute of Scientific and Technical Information of China (English)
Han Yan; Yang Yiheng
2008-01-01
The indicator kriging (IK) is one of the most efficient nonparametric methods in geo-statistics. The order relation problem in the conditional cumulative distribution values obtained by IK is the most severe drawback of it. The correction of order relation deviations is an essential and important part of IK approach. A monotone regression was proposed as a new correction method which could minimize the deviation from original quintiles value, although, ensuring all order relations.
Regression in children with autism spectrum disorders.
Malhi, Prahbhjot; Singhi, Pratibha
2012-10-01
To understand the characteristics of autistic regression and to compare the clinical and developmental profile of children with autism spectrum disorders (ASD) in whom parents report developmental regression with age matched ASD children in whom no regression is reported. Participants were 35 (Mean age = 3.57 y, SD = 1.09) children with ASD in whom parents reported developmental regression before age 3 y and a group of age and IQ matched 35 ASD children in whom parents did not report regression. All children were recruited from the outpatient Child Psychology Clinic of the Department of Pediatrics of a tertiary care teaching hospital in North India. Multi-disciplinary evaluations including neurological, diagnostic, cognitive, and behavioral assessments were done. Parents were asked in detail about the age at onset of regression, type of regression, milestones lost, and event, if any, related to the regression. In addition, the Childhood Autism Rating Scale (CARS) was administered to assess symptom severity. The mean age at regression was 22.43 mo (SD = 6.57) and large majority (66.7%) of the parents reported regression between 12 and 24 mo. Most (75%) of the parents of the regression-autistic group reported regression in the language domain, particularly in the expressive language sector, usually between 18 and 24 mo of age. Regression of language was not an isolated phenomenon and regression in other domains was also reported including social skills (75%), cognition (31.25%). In majority of the cases (75%) the regression reported was slow and subtle. There were no significant differences in the motor, social, self help, and communication functioning between the two groups as measured by the DP II.There were also no significant differences between the two groups on the total CARS score and total number of DSM IV symptoms endorsed. However, the regressed children had significantly (t = 2.36, P = .021) more social deficits as per the DSM IV as
Directory of Open Access Journals (Sweden)
Ismet DOGAN
2015-10-01
Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.
Low rank Multivariate regression
Giraud, Christophe
2010-01-01
We consider in this paper the multivariate regression problem, when the target regression matrix $A$ is close to a low rank matrix. Our primary interest in on the practical case where the variance of the noise is unknown. Our main contribution is to propose in this setting a criterion to select among a family of low rank estimators and prove a non-asymptotic oracle inequality for the resulting estimator. We also investigate the easier case where the variance of the noise is known and outline that the penalties appearing in our criterions are minimal (in some sense). These penalties involve the expected value of the Ky-Fan quasi-norm of some random matrices. These quantities can be evaluated easily in practice and upper-bounds can be derived from recent results in random matrix theory.
Subset selection in regression
Miller, Alan
2002-01-01
Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Classification and regression trees
Breiman, Leo; Olshen, Richard A; Stone, Charles J
1984-01-01
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
DEFF Research Database (Denmark)
Hansen, Henrik; Tarp, Finn
2001-01-01
. There are, however, decreasing returns to aid, and the estimated effectiveness of aid is highly sensitive to the choice of estimator and the set of control variables. When investment and human capital are controlled for, no positive effect of aid is found. Yet, aid continues to impact on growth via...... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes....
Robust Nonstationary Regression
1993-01-01
This paper provides a robust statistical approach to nonstationary time series regression and inference. Fully modified extensions of traditional robust statistical procedures are developed which allow for endogeneities in the nonstationary regressors and serial dependence in the shocks that drive the regressors and the errors that appear in the equation being estimated. The suggested estimators involve semiparametric corrections to accommodate these possibilities and they belong to the same ...
Li, Xiaofan; Zhao, Yubin; Zhang, Sha; Fan, Xiaopeng
2016-05-30
Particle filters (PFs) are widely used for nonlinear signal processing in wireless sensor networks (WSNs). However, the measurement uncertainty makes the WSN observations unreliable to the actual case and also degrades the estimation accuracy of the PFs. In addition to the algorithm design, few works focus on improving the likelihood calculation method, since it can be pre-assumed by a given distribution model. In this paper, we propose a novel PF method, which is based on a new likelihood fusion method for WSNs and can further improve the estimation performance. We firstly use a dynamic Gaussian model to describe the nonparametric features of the measurement uncertainty. Then, we propose a likelihood adaptation method that employs the prior information and a belief factor to reduce the measurement noise. The optimal belief factor is attained by deriving the minimum Kullback-Leibler divergence. The likelihood adaptation method can be integrated into any PFs, and we use our method to develop three versions of adaptive PFs for a target tracking system using wireless sensor network. The simulation and experimental results demonstrate that our likelihood adaptation method has greatly improved the estimation performance of PFs in a high noise environment. In addition, the adaptive PFs are highly adaptable to the environment without imposing computational complexity.
Kernel bandwidth estimation for non-parametric density estimation: a comparative study
CSIR Research Space (South Africa)
Van der Walt, CM
2013-12-01
Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...
Nonparametric Monitoring for Geotechnical Structures Subject to Long-Term Environmental Change
Directory of Open Access Journals (Sweden)
Hae-Bum Yun
2011-01-01
Full Text Available A nonparametric, data-driven methodology of monitoring for geotechnical structures subject to long-term environmental change is discussed. Avoiding physical assumptions or excessive simplification of the monitored structures, the nonparametric monitoring methodology presented in this paper provides reliable performance-related information particularly when the collection of sensor data is limited. For the validation of the nonparametric methodology, a field case study was performed using a full-scale retaining wall, which had been monitored for three years using three tilt gauges. Using the very limited sensor data, it is demonstrated that important performance-related information, such as drainage performance and sensor damage, could be disentangled from significant daily, seasonal and multiyear environmental variations. Extensive literature review on recent developments of parametric and nonparametric data processing techniques for geotechnical applications is also presented.
Nonparametric TOA estimators for low-resolution IR-UWB digital receiver
Institute of Scientific and Technical Information of China (English)
Yanlong Zhang; Weidong Chen
2015-01-01
Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
TWO REGRESSION CREDIBILITY MODELS
Directory of Open Access Journals (Sweden)
Constanţa-Nicoleta BODEA
2010-03-01
Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.
Examples of the Application of Nonparametric Information Geometry to Statistical Physics
Directory of Open Access Journals (Sweden)
Giovanni Pistone
2013-09-01
Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Non-parametric combination and related permutation tests for neuroimaging.
Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E
2016-04-01
In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.
Institute of Scientific and Technical Information of China (English)
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
Parametric and Non-Parametric Vibration-Based Structural Identification Under Earthquake Excitation
Pentaris, Fragkiskos P.; Fouskitakis, George N.
2014-05-01
The problem of modal identification in civil structures is of crucial importance, and thus has been receiving increasing attention in recent years. Vibration-based methods are quite promising as they are capable of identifying the structure's global characteristics, they are relatively easy to implement and they tend to be time effective and less expensive than most alternatives [1]. This paper focuses on the off-line structural/modal identification of civil (concrete) structures subjected to low-level earthquake excitations, under which, they remain within their linear operating regime. Earthquakes and their details are recorded and provided by the seismological network of Crete [2], which 'monitors' the broad region of south Hellenic arc, an active seismic region which functions as a natural laboratory for earthquake engineering of this kind. A sufficient number of seismic events are analyzed in order to reveal the modal characteristics of the structures under study, that consist of the two concrete buildings of the School of Applied Sciences, Technological Education Institute of Crete, located in Chania, Crete, Hellas. Both buildings are equipped with high-sensitivity and accuracy seismographs - providing acceleration measurements - established at the basement (structure's foundation) presently considered as the ground's acceleration (excitation) and at all levels (ground floor, 1st floor, 2nd floor and terrace). Further details regarding the instrumentation setup and data acquisition may be found in [3]. The present study invokes stochastic, both non-parametric (frequency-based) and parametric methods for structural/modal identification (natural frequencies and/or damping ratios). Non-parametric methods include Welch-based spectrum and Frequency response Function (FrF) estimation, while parametric methods, include AutoRegressive (AR), AutoRegressive with eXogeneous input (ARX) and Autoregressive Moving-Average with eXogeneous input (ARMAX) models[4, 5
Effects of dating errors on nonparametric trend analyses of speleothem time series
Directory of Open Access Journals (Sweden)
M. Mudelsee
2012-10-01
Full Text Available A fundamental problem in paleoclimatology is to take fully into account the various error sources when examining proxy records with quantitative methods of statistical time series analysis. Records from dated climate archives such as speleothems add extra uncertainty from the age determination to the other sources that consist in measurement and proxy errors. This paper examines three stalagmite time series of oxygen isotopic composition (δ^{18}O from two caves in western Germany, the series AH-1 from the Atta Cave and the series Bu1 and Bu4 from the Bunker Cave. These records carry regional information about past changes in winter precipitation and temperature. U/Th and radiocarbon dating reveals that they cover the later part of the Holocene, the past 8.6 thousand years (ka. We analyse centennial- to millennial-scale climate trends by means of nonparametric Gasser–Müller kernel regression. Error bands around fitted trend curves are determined by combining (1 block bootstrap resampling to preserve noise properties (shape, autocorrelation of the δ^{18}O residuals and (2 timescale simulations (models StalAge and iscam. The timescale error influences on centennial- to millennial-scale trend estimation are not excessively large. We find a "mid-Holocene climate double-swing", from warm to cold to warm winter conditions (6.5 ka to 6.0 ka to 5.1 ka, with warm–cold amplitudes of around 0.5‰ δ^{18}O; this finding is documented by all three records with high confidence. We also quantify the Medieval Warm Period (MWP, the Little Ice Age (LIA and the current warmth. Our analyses cannot unequivocally support the conclusion that current regional winter climate is warmer than that during the MWP.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Regression analysis of the structure function for reliability evaluation of continuous-state system
Energy Technology Data Exchange (ETDEWEB)
Gamiz, M.L., E-mail: mgamiz@ugr.e [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain); Martinez Miranda, M.D. [Departamento de Estadistica e I.O., Facultad de Ciencias, Universidad de Granada, Granada 18071 (Spain)
2010-02-15
Technical systems are designed to perform an intended task with an admissible range of efficiency. According to this idea, it is permissible that the system runs among different levels of performance, in addition to complete failure and the perfect functioning one. As a consequence, reliability theory has evolved from binary-state systems to the most general case of continuous-state system, in which the state of the system changes over time through some interval on the real number line. In this context, obtaining an expression for the structure function becomes difficult, compared to the discrete case, with difficulty increasing as the number of components of the system increases. In this work, we propose a method to build a structure function for a continuum system by using multivariate nonparametric regression techniques, in which certain analytical restrictions on the variable of interest must be taken into account. Once the structure function is obtained, some reliability indices of the system are estimated. We illustrate our method via several numerical examples.
Structuring feature space: a non-parametric method for volumetric transfer function generation.
Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S
2009-01-01
The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial
Directory of Open Access Journals (Sweden)
SANGCHAN KANTABUTRA
2009-04-01
Full Text Available This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA. Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have access to and practice different production technologies than rural schools, and rural institutions appear to operate less efficiently than their urban counterparts. In addition, a sensitivity analysis, conducted to ascertain the robustness of the analytical framework, revealed the stability of urban-rural effects on school efficiency. Policy to improve school eff iciency should thus take varying geographical area differences into account, viewing rural and urban schools as different from one another. Moreover, policymakers might consider shifting existing resources from urban schools to rural schools, provided that the increase in overall rural efficiency would be greater than the decrease, if any, in the city. Future research directions are discussed.
Harlander, Niklas; Rosenkranz, Tobias; Hohmann, Volker
2012-08-01
Single channel noise reduction has been well investigated and seems to have reached its limits in terms of speech intelligibility improvement, however, the quality of such schemes can still be advanced. This study tests to what extent novel model-based processing schemes might improve performance in particular for non-stationary noise conditions. Two prototype model-based algorithms, a speech-model-based, and a auditory-model-based algorithm were compared to a state-of-the-art non-parametric minimum statistics algorithm. A speech intelligibility test, preference rating, and listening effort scaling were performed. Additionally, three objective quality measures for the signal, background, and overall distortions were applied. For a better comparison of all algorithms, particular attention was given to the usage of the similar Wiener-based gain rule. The perceptual investigation was performed with fourteen hearing-impaired subjects. The results revealed that the non-parametric algorithm and the auditory model-based algorithm did not affect speech intelligibility, whereas the speech-model-based algorithm slightly decreased intelligibility. In terms of subjective quality, both model-based algorithms perform better than the unprocessed condition and the reference in particular for highly non-stationary noise environments. Data support the hypothesis that model-based algorithms are promising for improving performance in non-stationary noise conditions.
'nparACT' package for R: A free software tool for the non-parametric analysis of actigraphy data.
Blume, Christine; Santhi, Nayantara; Schabus, Manuel
2016-01-01
For many studies, participants' sleep-wake patterns are monitored and recorded prior to, during and following an experimental or clinical intervention using actigraphy, i.e. the recording of data generated by movements. Often, these data are merely inspected visually without computation of descriptive parameters, in part due to the lack of user-friendly software. To address this deficit, we developed a package for R Core Team [6], that allows computing several non-parametric measures from actigraphy data. Specifically, it computes the interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) of activity and gives the start times and average activity values of M10 (i.e. the ten hours with maximal activity) and L5 (i.e. the five hours with least activity). Two functions compute these 'classical' parameters and handle either single or multiple files. Two other functions additionally allow computing an L-value (i.e. the least activity value) for a user-defined time span termed 'Lflex' value. A plotting option is included in all functions. The package can be downloaded from the Comprehensive R Archives Network (CRAN). •The package 'nparACT' for R serves the non-parametric analysis of actigraphy data.•Computed parameters include interdaily stability (IS), intradaily variability (IV) and relative amplitude (RA) as well as start times and average activity during the 10 h with maximal and the 5 h with minimal activity (i.e. M10 and L5).
Chang, Ju Yong
2016-08-01
We present a new gesture recognition method that is based on the conditional random field (CRF) model using multiple feature matching. Our approach solves the labeling problem, determining gesture categories and their temporal ranges at the same time. A generative probabilistic model is formalized and probability densities are nonparametrically estimated by matching input features with a training dataset. In addition to the conventional skeletal joint-based features, the appearance information near the active hand in an RGB image is exploited to capture the detailed motion of fingers. The estimated likelihood function is then used as the unary term for our CRF model. The smoothness term is also incorporated to enforce the temporal coherence of our solution. Frame-wise recognition results can then be obtained by applying an efficient dynamic programming technique. To estimate the parameters of the proposed CRF model, we incorporate the structured support vector machine (SSVM) framework that can perform efficient structured learning by using large-scale datasets. Experimental results demonstrate that our method provides effective gesture recognition results for challenging real gesture datasets. By scoring 0.8563 in the mean Jaccard index, our method has obtained the state-of-the-art results for the gesture recognition track of the 2014 ChaLearn Looking at People (LAP) Challenge.
Ramajo, Julián; Cordero, José Manuel; Márquez, Miguel Ángel
2017-10-01
This paper analyses region-level technical efficiency in nine European countries over the 1995-2007 period. We propose the application of a nonparametric conditional frontier approach to account for the presence of heterogeneous conditions in the form of geographical externalities. Such environmental factors are beyond the control of regional authorities, but may affect the production function. Therefore, they need to be considered in the frontier estimation. Specifically, a spatial autoregressive term is included as an external conditioning factor in a robust order- m model. Thus we can test the hypothesis of non-separability (the external factor impacts both the input-output space and the distribution of efficiencies), demonstrating the existence of significant global interregional spillovers into the production process. Our findings show that geographical externalities affect both the frontier level and the probability of being more or less efficient. Specifically, the results support the fact that the spatial lag variable has an inverted U-shaped non-linear impact on the performance of regions. This finding can be interpreted as a differential effect of interregional spillovers depending on the size of the neighboring economies: positive externalities for small values, possibly related to agglomeration economies, and negative externalities for high values, indicating the possibility of production congestion. Additionally, evidence of the existence of a strong geographic pattern of European regional efficiency is reported and the levels of technical efficiency are acknowledged to have converged during the period under analysis.
Non-parametric reconstruction of the galaxy-lens in PG1115+080
Saha, P; Saha, Prasenjit; Williams, Liliya L. R.
1997-01-01
We describe a new, non-parametric, method for reconstructing lensing mass distributions in multiple-image systems, and apply it to PG1115, for which time delays have recently been measured. It turns out that the image positions and the ratio of time delays between different pairs of images constrain the mass distribution in a linear fashion. Since observational errors on image positions and time delay ratios are constantly improving, we use these data as a rigid constraint in our modelling. In addition, we require the projected mass distributions to be inversion-symmetric and to have inward-pointing density gradients. With these realistic yet non-restrictive conditions it is very easy to produce mass distributions that fit the data precisely. We then present models, for $H_0=42$, 63 and 84 \\kmsmpc, that in each case minimize mass-to-light variations while strictly obeying the lensing constraints. (Only a very rough light distribution is available at present.) All three values of $H_0$ are consistent with the ...
Decision making in coal mine planning using a non-parametric technique of indicator kriging
Energy Technology Data Exchange (ETDEWEB)
Mamurekli, D. [Hacettepe University, Ankara (Turkey). Mining Engineering Dept.
1997-03-01
In countries where low calorific value coal reserves are abundant and oil reserves are short or none, the requirement of energy production is mainly supported by coal-fired power stations. Consequently, planning to mine the low calorific value coal deposits gains much importance considering the technical and environmental restrictions. Such a mine in Kangal Town of Sivas City is the one that delivers run of mine coal directly to the power station built in the region. In case the calorific value and the ash content of the extracted coal are lower and higher than the required limits, 1300 kcal/kg and 21%, respectively, the power station may apply penalties to the coal producing company. Since the delivery is continuous and made by relying on in situ determination of pre-estimated values these assessments without defining any confidence levels are inevitably subject to inaccuracy. Thus, the company should be aware of uncertainties in making decisions and avoid conceivable risks. In this study, valuable information is provided in the form of conditional distribution to be used during planning process. It maps the indicator variogram corresponding to calorific value of 1300 kcal/kg and the ash content of 21% estimating the conditional probabilities that the true ash contents are less and calorific values are higher than the critical limits by the application of non-parametric technique, indicator kriging. In addition, it outlines the areas that are most uncertain for decision making. 4 refs., 8 figs., 3 tabs.
Directory of Open Access Journals (Sweden)
Antonio Canale
2017-06-01
Full Text Available msBP is an R package that implements a new method to perform Bayesian multiscale nonparametric inference introduced by Canale and Dunson (2016. The method, based on mixtures of multiscale beta dictionary densities, overcomes the drawbacks of Pólya trees and inherits many of the advantages of Dirichlet process mixture models. The key idea is that an infinitely-deep binary tree is introduced, with a beta dictionary density assigned to each node of the tree. Using a multiscale stick-breaking characterization, stochastically decreasing weights are assigned to each node. The result is an infinite mixture model. The package msBP implements a series of basic functions to deal with this family of priors such as random densities and numbers generation, creation and manipulation of binary tree objects, and generic functions to plot and print the results. In addition, it implements the Gibbs samplers for posterior computation to perform multiscale density estimation and multiscale testing of group differences described in Canale and Dunson (2016.
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Institute of Scientific and Technical Information of China (English)
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Directory of Open Access Journals (Sweden)
Karim Hardani*
2012-05-01
Full Text Available A 10-month-old baby presented with developmental delay. He had flaccid paralysis on physical examination.An MRI of the spine revealed malformation of the ninth and tenth thoracic vertebral bodies with complete agenesis of the rest of the spine down that level. The thoracic spinal cord ends at the level of the fifth thoracic vertebra with agenesis of the posterior arches of the eighth, ninth and tenth thoracic vertebral bodies. The roots of the cauda equina appear tightened down and backward and ended into a subdermal fibrous fatty tissue at the level of the ninth and tenth thoracic vertebral bodies (closed meningocele. These findings are consistent with caudal regression syndrome.
Nonparametric methods for drought severity estimation at ungauged sites
Sadri, S.; Burn, D. H.
2012-12-01
The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.
Institute of Scientific and Technical Information of China (English)
Wengang Zhang; Anthony T.C. Goh
2016-01-01
Piles are long, slender structural elements used to transfer the loads from the superstructure through weak strata onto stiffer soils or rocks. For driven piles, the impact of the piling hammer induces compression and tension stresses in the piles. Hence, an important design consideration is to check that the strength of the pile is sufficient to resist the stresses caused by the impact of the pile hammer. Due to its complexity, pile drivability lacks a precise analytical solution with regard to the phenomena involved. In situations where measured data or numerical hypothetical results are available, neural networks stand out in mapping the nonlinear interactions and relationships between the system’s predictors and dependent responses. In addition, unlike most computational tools, no mathematical relationship assumption between the dependent and independent variables has to be made. Nevertheless, neural networks have been criticized for their long trial-and-error training process since the optimal configu-ration is not known a priori. This paper investigates the use of a fairly simple nonparametric regression algorithm known as multivariate adaptive regression splines (MARS), as an alternative to neural net-works, to approximate the relationship between the inputs and dependent response, and to mathe-matically interpret the relationship between the various parameters. In this paper, the Back propagation neural network (BPNN) and MARS models are developed for assessing pile drivability in relation to the prediction of the Maximum compressive stresses (MCS), Maximum tensile stresses (MTS), and Blow per foot (BPF). A database of more than four thousand piles is utilized for model development and comparative performance between BPNN and MARS predictions.
Directory of Open Access Journals (Sweden)
Archer Kellie J
2008-02-01
Full Text Available Abstract Background With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN to those with normal functioning allograft. Results The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. Conclusion We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been
Kong, Xiangrong; Mas, Valeria; Archer, Kellie J
2008-02-26
With the popularity of DNA microarray technology, multiple groups of researchers have studied the gene expression of similar biological conditions. Different methods have been developed to integrate the results from various microarray studies, though most of them rely on distributional assumptions, such as the t-statistic based, mixed-effects model, or Bayesian model methods. However, often the sample size for each individual microarray experiment is small. Therefore, in this paper we present a non-parametric meta-analysis approach for combining data from independent microarray studies, and illustrate its application on two independent Affymetrix GeneChip studies that compared the gene expression of biopsies from kidney transplant recipients with chronic allograft nephropathy (CAN) to those with normal functioning allograft. The simulation study comparing the non-parametric meta-analysis approach to a commonly used t-statistic based approach shows that the non-parametric approach has better sensitivity and specificity. For the application on the two CAN studies, we identified 309 distinct genes that expressed differently in CAN. By applying Fisher's exact test to identify enriched KEGG pathways among those genes called differentially expressed, we found 6 KEGG pathways to be over-represented among the identified genes. We used the expression measurements of the identified genes as predictors to predict the class labels for 6 additional biopsy samples, and the predicted results all conformed to their pathologist diagnosed class labels. We present a new approach for combining data from multiple independent microarray studies. This approach is non-parametric and does not rely on any distributional assumptions. The rationale behind the approach is logically intuitive and can be easily understood by researchers not having advanced training in statistics. Some of the identified genes and pathways have been reported to be relevant to renal diseases. Further study on the
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Directory of Open Access Journals (Sweden)
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
Approximation of conditional densities by smooth mixtures of regressions
Norets, Andriy
2010-01-01
This paper shows that large nonparametric classes of conditional multivariate densities can be approximated in the Kullback--Leibler distance by different specifications of finite mixtures of normal regressions in which normal means and variances and mixing probabilities can depend on variables in the conditioning set (covariates). These models are a special case of models known as "mixtures of experts" in statistics and computer science literature. Flexible specifications include models in which only mixing probabilities, modeled by multinomial logit, depend on the covariates and, in the univariate case, models in which only means of the mixed normals depend flexibly on the covariates. Modeling the variance of the mixed normals by flexible functions of the covariates can weaken restrictions on the class of the approximable densities. Obtained results can be generalized to mixtures of general location scale densities. Rates of convergence and easy to interpret bounds are also obtained for different model spec...
DEFF Research Database (Denmark)
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Non-parametric seismic hazard analysis in the presence of incomplete data
Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush
2017-01-01
The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.
Recursive Algorithm For Linear Regression
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Ford, Eric B; Steffen, Jason H; Carter, Joshua A; Fressin, Francois; Holman, Matthew J; Lissauer, Jack J; Moorhead, Althea V; Morehead, Robert C; Ragozzine, Darin; Rowe, Jason F; Welsh, William F; Allen, Christopher; Batalha, Natalie M; Borucki, William J; Bryson, Stephen T; Buchhave, Lars A; Burke, Christopher J; Caldwell, Douglas A; Charbonneau, David; Clarke, Bruce D; Cochran, William D; Désert, Jean-Michel; Endl, Michael; Everett, Mark E; Fischer, Debra A; Gautier, Thomas N; Gilliland, Ron L; Jenkins, Jon M; Haas, Michael R; Horch, Elliott; Howell, Steve B; Ibrahim, Khadeejah A; Isaacson, Howard; Koch, David G; Latham, David W; Li, Jie; Lucas, Philip; MacQueen, Phillip J; Marcy, Geoffrey W; McCauliff, Sean; Mullally, Fergal R; Quinn, Samuel N; Quintana, Elisa; Shporer, Avi; Still, Martin; Tenenbaum, Peter; Thompson, Susan E; Torres, Guillermo; Twicken, Joseph D; Wohler, Bill
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:...
Cannon, Alex J.
2011-09-01
The qrnn package for R implements the quantile regression neural network, which is an artificial neural network extension of linear quantile regression. The model formulation follows from previous work on the estimation of censored regression quantiles. The result is a nonparametric, nonlinear model suitable for making probabilistic predictions of mixed discrete-continuous variables like precipitation amounts, wind speeds, or pollutant concentrations, as well as continuous variables. A differentiable approximation to the quantile regression error function is adopted so that gradient-based optimization algorithms can be used to estimate model parameters. Weight penalty and bootstrap aggregation methods are used to avoid overfitting. For convenience, functions for quantile-based probability density, cumulative distribution, and inverse cumulative distribution functions are also provided. Package functions are demonstrated on a simple precipitation downscaling task.
A Robbins-Monro procedure for estimation in semiparametric regression models
Bercu, Bernard
2011-01-01
This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a Robbins-Monro procedure very efficient and easy to handle. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary for estimating the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes in account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya-Watson estimators. The asymptotic normality of our estimates is also provided.
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Directory of Open Access Journals (Sweden)
Rabia Ece OMAY
2013-06-01
Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.
Zhao, Zhibiao
2011-06-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Regression in autistic spectrum disorders.
Stefanatos, Gerry A
2008-12-01
A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.
Combining Alphas via Bounded Regression
Directory of Open Access Journals (Sweden)
Zura Kakushadze
2015-11-01
Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.
Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S
2016-05-01
A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.
Linear regression in astronomy. I
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
A non-parametric peak calling algorithm for DamID-Seq.
Directory of Open Access Journals (Sweden)
Renhua Li
Full Text Available Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS of double sex (DSX-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq. One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only. After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1 reads resampling; 2 reads scaling (normalization and computing signal-to-noise fold changes; 3 filtering; 4 Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC. We also used irreproducible discovery rate (IDR analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
A non-parametric peak calling algorithm for DamID-Seq.
Li, Renhua; Hempel, Leonie U; Jiang, Tingbo
2015-01-01
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan
2017-02-15
Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.
Johnson, H.O.; Gupta, S.C.; Vecchia, A.V.; Zvomuya, F.
2009-01-01
Excessive loading of sediment and nutrients to rivers is a major problem in many parts of the United States. In this study, we tested the non-parametric Seasonal Kendall (SEAKEN) trend model and the parametric USGS Quality of Water trend program (QWTREND) to quantify trends in water quality of the Minnesota River at Fort Snelling from 1976 to 2003. Both methods indicated decreasing trends in flow-adjusted concentrations of total suspended solids (TSS), total phosphorus (TP), and orthophosphorus (OP) and a generally increasing trend in flow-adjusted nitrate plus nitrite-nitrogen (NO3-N) concentration. The SEAKEN results were strongly influenced by the length of the record as well as extreme years (dry or wet) earlier in the record. The QWTREND results, though influenced somewhat by the same factors, were more stable. The magnitudes of trends between the two methods were somewhat different and appeared to be associated with conceptual differences between the flow-adjustment processes used and with data processing methods. The decreasing trends in TSS, TP, and OP concentrations are likely related to conservation measures implemented in the basin. However, dilution effects from wet climate or additional tile drainage cannot be ruled out. The increasing trend in NO3-N concentrations was likely due to increased drainage in the basin. Since the Minnesota River is the main source of sediments to the Mississippi River, this study also addressed the rapid filling of Lake Pepin on the Mississippi River and found the likely cause to be increased flow due to recent wet climate in the region. Copyright ?? 2009 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Xu, Huan; Mannor, Shie
2008-01-01
Lasso, or $\\ell^1$ regularized least squares, has been explored extensively for its remarkable sparsity properties. It is shown in this paper that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. This has two important consequences. First, robustness provides a connection of the regularizer to a physical property, namely, protection from noise. This allows a principled selection of the regularizer, and in particular, generalizations of Lasso that also yield convex optimization problems are obtained by considering different uncertainty sets. Secondly, robustness can itself be used as an avenue to exploring different properties of the solution. In particular, it is shown that robustness of the solution explains why the solution is sparse. The analysis as well as the specific results obtained differ from standard sparsity results, providing different geometric intuition. Furthermore, it is shown that the robust optimization formul...
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any func
Modeling confounding by half-sibling regression
DEFF Research Database (Denmark)
Schölkopf, Bernhard; Hogg, David W; Wang, Dun
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...
Nonparametric estimation of population density for line transect sampling using FOURIER series
Crain, B.R.; Burnham, K.P.; Anderson, D.R.; Lake, J.L.
1979-01-01
A nonparametric, robust density estimation method is explored for the analysis of right-angle distances from a transect line to the objects sighted. The method is based on the FOURIER series expansion of a probability density function over an interval. With only mild assumptions, a general population density estimator of wide applicability is obtained.
A non-parametric peak finder algorithm and its application in searches for new physics
Chekanov, S
2011-01-01
We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.
Nonparametric estimation of the stationary M/G/1 workload distribution function
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted
2005-01-01
In this paper it is demonstrated how a nonparametric estimator of the stationary workload distribution function of the M/G/1-queue can be obtained by systematic sampling the workload process. Weak convergence results and bootstrap methods for empirical distribution functions for stationary associ...
Testing a parametric function against a nonparametric alternative in IV and GMM settings
DEFF Research Database (Denmark)
Gørgens, Tue; Wurtz, Allan
This paper develops a specification test for functional form for models identified by moment restrictions, including IV and GMM settings. The general framework is one where the moment restrictions are specified as functions of data, a finite-dimensional parameter vector, and a nonparametric real...
Non-parametric Bayesian graph models reveal community structure in resting state fMRI
DEFF Research Database (Denmark)
Andersen, Kasper Winther; Madsen, Kristoffer H.; Siebner, Hartwig Roman
2014-01-01
Modeling of resting state functional magnetic resonance imaging (rs-fMRI) data using network models is of increasing interest. It is often desirable to group nodes into clusters to interpret the communication patterns between nodes. In this study we consider three different nonparametric Bayesian...
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-parametric system identification from non-linear stochastic response
DEFF Research Database (Denmark)
Rüdinger, Finn; Krenk, Steen
2001-01-01
An estimation method is proposed for identification of non-linear stiffness and damping of single-degree-of-freedom systems under stationary white noise excitation. Non-parametric estimates of the stiffness and damping along with an estimate of the white noise intensity are obtained by suitable p...
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test
DEFF Research Database (Denmark)
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...
Jang, Eunice Eunhee; Roussos, Louis
2007-01-01
This article reports two studies to illustrate methodologies for conducting a conditional covariance-based nonparametric dimensionality assessment using data from two forms of the Test of English as a Foreign Language (TOEFL). Study 1 illustrates how to assess overall dimensionality of the TOEFL including all three subtests. Study 2 is aimed at…
Comparison of reliability techniques of parametric and non-parametric method
Directory of Open Access Journals (Sweden)
C. Kalaiselvan
2016-06-01
Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...
A Comparison of Shewhart Control Charts based on Normality, Nonparametrics, and Extreme-Value Theory
Ion, R.A.; Does, R.J.M.M.; Klaassen, C.A.J.
2000-01-01
Several control charts for individual observations are compared. The traditional ones are the well-known Shewhart control charts with estimators for the spread based on the sample standard deviation and the average of the moving ranges. The alternatives are nonparametric control charts, based on emp
Non-parametric production analysis of pesticides use in the Netherlands
Oude Lansink, A.G.J.M.; Silva, E.
2004-01-01
Many previous empirical studies on the productivity of pesticides suggest that pesticides are under-utilized in agriculture despite the general held believe that these inputs are substantially over-utilized. This paper uses data envelopment analysis (DEA) to calculate non-parametric measures of the
An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.
2014-01-01
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Agasisti, Tommaso
2011-01-01
The objective of this paper is an efficiency analysis concerning higher education systems in European countries. Data have been extracted from OECD data-sets (Education at a Glance, several years), using a non-parametric technique--data envelopment analysis--to calculate efficiency scores. This paper represents the first attempt to conduct such an…
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.
Fan, Jianqing; Ma, Yunbei; Dai, Wei
2014-01-01
The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.
Low default credit scoring using two-class non-parametric kernel density estimation
CSIR Research Space (South Africa)
Rademeyer, E
2016-12-01
Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...
Nonparametric Tests of Collectively Rational Consumption Behavior : An Integer Programming Procedure
Cherchye, L.J.H.; de Rock, B.; Sabbe, J.; Vermeulen, F.M.P.
2008-01-01
We present an IP-based nonparametric (revealed preference) testing proce- dure for rational consumption behavior in terms of general collective models, which include consumption externalities and public consumption. An empiri- cal application to data drawn from the Russia Longitudinal Monitoring
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Nonparametric Autoregression Model on Consumer Price Index%居民消费价格指数的非参数自回归模型
Institute of Scientific and Technical Information of China (English)
代洪伟; 凌能祥
2012-01-01
The nonparametric autoregression model was established using the data of Chinese consumer pr/ce index in 2004 -2008. The OLS estimation, the orthogonal sequence estimation and spline est/mation were used to estimate the regressive function respectively. The result showed that the nonparametric model is superior to linear models and in the three estimation methods, the orthogonal sequence estimation is the best. Finally, the simulated and predicted results were eomoared with those oresented by LIU Chun - van based on ARIMA model.%利用我国2004年-2008年的居民消费价格指数数据，建立非参数自回归模型，并分别用线性最小二乘方法、正交序列方法和多项式样条方法进行了拟合和预测．结果表明，非参数模型优于线性模型；在三种估计方法中，正交序列估计方法优于其他两种方法．最后将模拟、预测的结果和刘春燕等建立的基于ARIMA模型模拟、预测的结果进行了比较．
Rights, Jason D; Sterba, Sonya K
2016-11-01
Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Linear regression in astronomy. II
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Polynomial Regression on Riemannian Manifolds
Hinkle, Jacob; Fletcher, P Thomas; Joshi, Sarang
2012-01-01
In this paper we develop the theory of parametric polynomial regression in Riemannian manifolds and Lie groups. We show application of Riemannian polynomial regression to shape analysis in Kendall shape space. Results are presented, showing the power of polynomial regression on the classic rat skull growth data of Bookstein as well as the analysis of the shape changes associated with aging of the corpus callosum from the OASIS Alzheimer's study.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Quantile regression theory and applications
Davino, Cristina; Vistocco, Domenico
2013-01-01
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Business applications of multiple regression
Richardson, Ronny
2015-01-01
This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta
DEFF Research Database (Denmark)
Linnet, Kristian
2005-01-01
Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...
Hsu, Yu-Han H; Ferl, Gregory Z; Ng, Chee M
2013-05-01
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is often used to examine vascular function in malignant tumors and noninvasively monitor drug efficacy of antivascular therapies in clinical studies. However, complex numerical methods used to derive tumor physiological properties from DCE-MRI images can be time-consuming and computationally challenging. Recent advancement of computing technology in graphics processing unit (GPU) makes it possible to build an energy-efficient and high-power parallel computing platform for solving complex numerical problems. This study develops the first reported fast GPU-based method for nonparametric kinetic analysis of DCE-MRI data using clinical scans of glioblastoma patients treated with bevacizumab (Avastin®). In the method, contrast agent concentration-time profiles in arterial blood and tumor tissue are smoothed using a robust kernel-based regression algorithm in order to remove artifacts due to patient motion and then deconvolved to produce the impulse response function (IRF). The area under the curve (AUC) and mean residence time (MRT) of the IRF are calculated using statistical moment analysis, and two tumor physiological properties that relate to vascular permeability, volume transfer constant between blood plasma and extravascular extracellular space (K(trans)) and fractional interstitial volume (ve) are estimated using the approximations AUC/MRT and AUC. The most significant feature in this method is the use of GPU-computing to analyze data from more than 60,000 voxels in each DCE-MRI image in parallel fashion. All analysis steps have been automated in a single program script that requires only blood and tumor data as the sole input. The GPU-accelerated method produces K(trans) and ve estimates that are comparable to results from previous studies but reduces computational time by more than 80-fold compared to a previously reported central processing unit-based nonparametric method. Furthermore, it is at
Logistic Regression: Concept and Application
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record
Regression Testing Cost Reduction Suite
Directory of Open Access Journals (Sweden)
Mohamed Alaa El-Din
2014-08-01
Full Text Available The estimated cost of software maintenance exceeds 70 percent of total software costs [1], and large portion of this maintenance expenses is devoted to regression testing. Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. Test suites once developed are reused and updated frequently as the software evolves. As a result, some test cases in the test suite may become redundant when the software is modified over time since the requirements covered by them are also covered by other test cases. Due to the resource and time constraints for re-executing large test suites, it is important to develop techniques to minimize available test suites by removing redundant test cases. In general, the test suite minimization problem is NP complete. This paper focuses on proposing an effective approach for reducing the cost of regression testing process. The proposed approach is applied on real-time case study. It was found that the reduction in cost of regression testing for each regression testing cycle is ranging highly improved in the case of programs containing high number of selected statements which in turn maximize the benefits of using it in regression testing of complex software systems. The reduction in the regression test suite size will reduce the effort and time required by the testing teams to execute the regression test suite. Since regression testing is done more frequently in software maintenance phase, the overall software maintenance cost can be reduced considerably by applying the proposed approach.
M. Ahmadlou; M. R. Delavar; Tayyebi, A.; H. Shafizadeh-Moghadam
2015-01-01
Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the m...
Empirical likelihood ratio tests for multivariate regression models
Institute of Scientific and Technical Information of China (English)
WU Jianhong; ZHU Lixing
2007-01-01
This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.
Rank regression: an alternative regression approach for data with outliers.
Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin
2014-10-01
Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.
Duvenaud, David; Rasmussen, Carl Edward
2011-01-01
We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks.
Directory of Open Access Journals (Sweden)
Paul Rozin
2009-10-01
Full Text Available Judgments of naturalness of foods tend to be more influenced by the process history of a food, rather than its actual constituents. Two types of processing of a ``natural'' food are to add something or to remove something. We report in this study, based on a large random sample of individuals from six countries (France, Germany, Italy, Switzerland, UK and USA that additives are considered defining features of what makes a food not natural, whereas ``subtractives'' are almost never mentioned. In support of this, skim milk (with major subtraction of fat is rated as more natural than whole milk with a small amount of natural vitamin D added. It is also noted that ``additives'' is a common word, with a synonym reported by a native speaker in 17 of 18 languages, whereas ``subtractive'' is lexicalized in only 1 of the 18 languages. We consider reasons for additivity dominance, relating it to omission bias, feature positive bias, and notions of purity.
Evaluation of world's largest social welfare scheme: An assessment using non-parametric approach.
Singh, Sanjeet
2016-08-01
Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) is the world's largest social welfare scheme in India for the poverty alleviation through rural employment generation. This paper aims to evaluate and rank the performance of the states in India under MGNREGA scheme. A non-parametric approach, Data Envelopment Analysis (DEA) is used to calculate the overall technical, pure technical, and scale efficiencies of states in India. The sample data is drawn from the annual official reports published by the Ministry of Rural Development, Government of India. Based on three selected input parameters (expenditure indicators) and five output parameters (employment generation indicators), I apply both input and output oriented DEA models to estimate how well the states utilize their resources and generate outputs during the financial year 2013-14. The relative performance evaluation has been made under the assumption of constant returns and also under variable returns to scale to assess the impact of scale on performance. The results indicate that the main source of inefficiency is both technical and managerial practices adopted. 11 states are overall technically efficient and operate at the optimum scale whereas 18 states are pure technical or managerially efficient. It has been found that for some states it necessary to alter scheme size to perform at par with the best performing states. For inefficient states optimal input and output targets along with the resource savings and output gains are calculated. Analysis shows that if all inefficient states operate at optimal input and output levels, on an average 17.89% of total expenditure and a total amount of $780million could have been saved in a single year. Most of the inefficient states perform poorly when it comes to the participation of women and disadvantaged sections (SC&ST) in the scheme. In order to catch up with the performance of best performing states, inefficient states on an average need to enhance