Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Energy Technology Data Exchange (ETDEWEB)
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Revisiting the Distance Duality Relation using a non-parametric regression method
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
Nonparametric Regression with Common Shocks
Directory of Open Access Journals (Sweden)
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes
Directory of Open Access Journals (Sweden)
Biqing Cai
2015-04-01
Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Yerlikaya-Özkurt, Fatma; Askan, Aysegul; Weber, Gerhard-Wilhelm
2014-12-01
Ground Motion Prediction Equations (GMPEs) are empirical relationships which are used for determining the peak ground response at a particular distance from an earthquake source. They relate the peak ground responses as a function of earthquake source type, distance from the source, local site conditions where the data are recorded and finally the depth and magnitude of the earthquake. In this article, a new prediction algorithm, called Conic Multivariate Adaptive Regression Splines (CMARS), is employed on an available dataset for deriving a new GMPE. CMARS is based on a special continuous optimization technique, conic quadratic programming. These convex optimization problems are very well-structured, resembling linear programs and, hence, permitting the use of interior point methods. The CMARS method is performed on the strong ground motion database of Turkey. Results are compared with three other GMPEs. CMARS is found to be effective for ground motion prediction purposes.
Nonparametric additive regression for repeatedly measured data
Carroll, R. J.
2009-05-20
We develop an easily computed smooth backfitting algorithm for additive model fitting in repeated measures problems. Our methodology easily copes with various settings, such as when some covariates are the same over repeated response measurements. We allow for a working covariance matrix for the regression errors, showing that our method is most efficient when the correct covariance matrix is used. The component functions achieve the known asymptotic variance lower bound for the scalar argument case. Smooth backfitting also leads directly to design-independent biases in the local linear case. Simulations show our estimator has smaller variance than the usual kernel estimator. This is also illustrated by an example from nutritional epidemiology. © 2009 Biometrika Trust.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Nonparametric regression with martingale increment errors
Delattre, Sylvain
2010-01-01
We consider the problem of adaptive estimation of the regression function in a framework where we replace ergodicity assumptions (such as independence or mixing) by another structural assumption on the model. Namely, we propose adaptive upper bounds for kernel estimators with data-driven bandwidth (Lepski's selection rule) in a regression model where the noise is an increment of martingale. It includes, as very particular cases, the usual i.i.d. regression and auto-regressive models. The cornerstone tool for this study is a new result for self-normalized martingales, called ``stability'', which is of independent interest. In a first part, we only use the martingale increment structure of the noise. We give an adaptive upper bound using a random rate, that involves the occupation time near the estimation point. Thanks to this approach, the theoretical study of the statistical procedure is disconnected from usual ergodicity properties like mixing. Then, in a second part, we make a link with the usual minimax th...
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Anestis Antoniadis
2001-06-01
Full Text Available Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Fahmy, Ossama T; Ragab, Marwa A A
2010-11-15
This manuscript discusses the application of chemometrics to the handling of HPLC response data using the internal standard method (ISM). This was performed on a model mixture containing terbutaline sulphate, guaiphenesin, bromhexine HCl, sodium benzoate and propylparaben as an internal standard. Derivative treatment of chromatographic response data of analyte and internal standard was followed by convolution of the resulting derivative curves using 8-points sin x(i) polynomials (discrete Fourier functions). The response of each analyte signal, its corresponding derivative and convoluted derivative data were divided by that of the internal standard to obtain the corresponding ratio data. This was found beneficial in eliminating different types of interferences. It was successfully applied to handle some of the most common chromatographic problems and non-ideal conditions, namely: overlapping chromatographic peaks and very low analyte concentrations. For example, a significant change in the correlation coefficient of sodium benzoate, in case of overlapping peaks, went from 0.9975 to 0.9998 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. Also a significant improvement in the precision and accuracy for the determination of synthetic mixtures and dosage forms in non-ideal cases was achieved. For example, in the case of overlapping peaks guaiphenesin mean recovery% and RSD% went from 91.57, 9.83 to 100.04, 0.78 on applying normal conventional peak area and first derivative under Fourier functions methods, respectively. This work also compares the application of Theil's method, a non-parametric regression method, in handling the response ratio data, with the least squares parametric regression method, which is considered the de facto standard method used for regression. Theil's method was found to be superior to the method of least squares as it assumes that errors could occur in both x- and y-directions and
Coverage Accuracy of Confidence Intervals in Nonparametric Regression
Institute of Scientific and Technical Information of China (English)
Song-xi Chen; Yong-song Qin
2003-01-01
Point-wise confidence intervals for a nonparametric regression function with random design points are considered. The confidence intervals are those based on the traditional normal approximation and the empirical likelihood. Their coverage accuracy is assessed by developing the Edgeworth expansions for the coverage probabilities. It is shown that the empirical likelihood confidence intervals are Bartlett correctable.
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Testing for a constant coefficient of variation in nonparametric regression
Dette, Holger; Marchlewski, Mareen; Wagener, Jens
2010-01-01
In the common nonparametric regression model Y_i=m(X_i)+sigma(X_i)epsilon_i we consider the problem of testing the hypothesis that the coefficient of the scale and location function is constant. The test is based on a comparison of the observations Y_i=\\hat{sigma}(X_i) with their mean by a smoothed empirical process, where \\hat{sigma} denotes the local linear estimate of the scale function. We show weak convergence of a centered version of this process to a Gaussian process under the null ...
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
Institute of Scientific and Technical Information of China (English)
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Dai, Wenlin
2017-09-01
Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Institute of Scientific and Technical Information of China (English)
LINGNeng-xiang; DUXue-qiao
2005-01-01
In this paper, we study the strong consistency for partitioning estimation of regression function under samples that axe φ-mixing sequences with identically distribution.Key words: nonparametric regression function; partitioning estimation; strong convergence;φ-mixing sequences.
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Wei, Jiawei; Carroll, Raymond J; Maity, Arnab
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
Institute of Scientific and Technical Information of China (English)
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
portfolio optimization based on nonparametric estimation methods
Directory of Open Access Journals (Sweden)
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting
Directory of Open Access Journals (Sweden)
Jelena Fiosina
2017-01-01
Full Text Available Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers in the distributed realization.
Zhu, Feng; Feng, Weiyue; Wang, Huajian; Huang, Shaosen; Lv, Yisong; Chen, Yong
2013-01-01
X-ray spectral imaging provides quantitative imaging of trace elements in biological sample with high sensitivity. We propose a novel algorithm to promote the signal-to-noise ratio (SNR) of X-ray spectral images that have low photon counts. Firstly, we estimate the image data area that belongs to the homogeneous parts through confidence interval testing. Then, we apply the Poisson regression through its maximum likelihood estimation on this area to estimate the true photon counts from the Poisson noise corrupted data. Unlike other denoising methods based on regression analysis, we use the bootstrap resampling methods to ensure the accuracy of regression estimation. Finally, we use a robust local nonparametric regression method to estimate the baseline and subsequently subtract it from the X-ray spectral data to further improve the SNR of the data. Experiments on several real samples show that the proposed method performs better than some state-of-the-art approaches to ensure accuracy and precision for quantit...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
-Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Forecasting of Households Consumption Expenditure with Nonparametric Regression: The Case of Turkey
Directory of Open Access Journals (Sweden)
Aydin Noyan
2016-11-01
Full Text Available The relationship between household income and expenditure is important for understanding how the shape of the economic dynamics of the households. In this study, the relationship between household consumption expenditure and household disposable income were analyzed by Locally Weighted Scatterplot Smoothing Regression which is a nonparametric method using R programming. This study aimed to determine relationship between variables directly, unlike making any assumptions are commonly used as in the conventional parametric regression. According to the findings, effect on expenditure with increasing of income and household size together increased rapidly at first, and then speed of increase decreased. This increase can be explained by having greater compulsory consumption expenditure relatively in small households. Besides, expenditure is relatively higher in middle and high income levels according to low income level. However, the change in expenditure is limited in middle and is the most limited in high income levels when household size changes.
Nonparametric Least Squares Estimation of a Multivariate Convex Regression Function
Seijo, Emilio
2010-01-01
This paper deals with the consistency of the least squares estimator of a convex regression function when the predictor is multidimensional. We characterize and discuss the computation of such an estimator via the solution of certain quadratic and linear programs. Mild sufficient conditions for the consistency of this estimator and its subdifferentials in fixed and stochastic design regression settings are provided. We also consider a regression function which is known to be convex and componentwise nonincreasing and discuss the characterization, computation and consistency of its least squares estimator.
Carroll, Raymond J.
2011-03-01
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Nonparametric methods in actigraphy: An update
Directory of Open Access Journals (Sweden)
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines
Kuter, S.; Akyürek, Z.; Weber, G.-W.
2016-10-01
Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Directory of Open Access Journals (Sweden)
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
Institute of Scientific and Technical Information of China (English)
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model
Directory of Open Access Journals (Sweden)
Yujuan Sun
2014-01-01
Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.
LSTA, Rawane Samb
2010-01-01
This thesis deals with the nonparametric estimation of density f of the regression error term E of the model Y=m(X)+E, assuming its independence with the covariate X. The difficulty linked to this study is the fact that the regression error E is not observed. In a such setup, it would be unwise, for estimating f, to use a conditional approach based upon the probability distribution function of Y given X. Indeed, this approach is affected by the curse of dimensionality, so that the resulting estimator of the residual term E would have considerably a slow rate of convergence if the dimension of X is very high. Two approaches are proposed in this thesis to avoid the curse of dimensionality. The first approach uses the estimated residuals, while the second integrates a nonparametric conditional density estimator of Y given X. If proceeding so can circumvent the curse of dimensionality, a challenging issue is to evaluate the impact of the estimated residuals on the final estimator of the density f. We will also at...
Subpixel Snow Cover Mapping from MODIS Data by Nonparametric Regression Splines
Akyurek, Z.; Kuter, S.; Weber, G. W.
2016-12-01
Spatial extent of snow cover is often considered as one of the key parameters in climatological, hydrological and ecological modeling due to its energy storage, high reflectance in the visible and NIR regions of the electromagnetic spectrum, significant heat capacity and insulating properties. A significant challenge in snow mapping by remote sensing (RS) is the trade-off between the temporal and spatial resolution of satellite imageries. In order to tackle this issue, machine learning-based subpixel snow mapping methods, like Artificial Neural Networks (ANNs), from low or moderate resolution images have been proposed. Multivariate Adaptive Regression Splines (MARS) is a nonparametric regression tool that can build flexible models for high dimensional and complex nonlinear data. Although MARS is not often employed in RS, it has various successful implementations such as estimation of vertical total electron content in ionosphere, atmospheric correction and classification of satellite images. This study is the first attempt in RS to evaluate the applicability of MARS for subpixel snow cover mapping from MODIS data. Total 16 MODIS-Landsat ETM+ image pairs taken over European Alps between March 2000 and April 2003 were used in the study. MODIS top-of-atmospheric reflectance, NDSI, NDVI and land cover classes were used as predictor variables. Cloud-covered, cloud shadow, water and bad-quality pixels were excluded from further analysis by a spatial mask. MARS models were trained and validated by using reference fractional snow cover (FSC) maps generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also developed. The mutual comparison of obtained MARS and ANN models was accomplished on independent test areas. The MARS model performed better than the ANN model with an average RMSE of 0.1288 over the independent test areas; whereas the average RMSE of the ANN model
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Fast pixel-based optical proximity correction based on nonparametric kernel regression
Ma, Xu; Wu, Bingliang; Song, Zhiyang; Jiang, Shangliang; Li, Yanqiu
2014-10-01
Optical proximity correction (OPC) is a resolution enhancement technique extensively used in the semiconductor industry to improve the resolution and pattern fidelity of optical lithography. In pixel-based OPC (PBOPC), the layout is divided into small pixels, which are then iteratively modified until the simulated print image on the wafer matches the desired pattern. However, the increasing complexity and size of modern integrated circuits make PBOPC techniques quite computationally intensive. This paper focuses on developing a practical and efficient PBOPC algorithm based on a nonparametric kernel regression, a well-known technique in machine learning. Specifically, we estimate the OPC patterns based on the geometric characteristics of the original layout corresponding to the same region and a series of training examples. Experimental results on metal layers show that our proposed approach significantly improves the speed of a current professional PBOPC software by a factor of 2 to 3, and may further reduce the mask complexity.
Montiel, Ariadna; Sendra, Irene; Escamilla-Rivera, Celia; Salzano, Vincenzo
2014-01-01
In this work we present a nonparametric approach, which works on minimal assumptions, to reconstruct the cosmic expansion of the Universe. We propose to combine a locally weighted scatterplot smoothing method and a simulation-extrapolation method. The first one (Loess) is a nonparametric approach that allows to obtain smoothed curves with no prior knowledge of the functional relationship between variables nor of the cosmological quantities. The second one (Simex) takes into account the effect of measurement errors on a variable via a simulation process. For the reconstructions we use as raw data the Union2.1 Type Ia Supernovae compilation, as well as recent Hubble parameter measurements. This work aims to illustrate the approach, which turns out to be a self-sufficient technique in the sense we do not have to choose anything by hand. We examine the details of the method, among them the amount of observational data needed to perform the locally weighted fit which will define the robustness of our reconstructio...
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by m...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result. PMID:28133490
Directory of Open Access Journals (Sweden)
Yue Fan
2017-01-01
Full Text Available Gene regulatory networks (GRNs play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Non-parametric versus parametric methods in environmental sciences
Directory of Open Access Journals (Sweden)
Muhammad Riaz
2016-01-01
Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...
Semi-parametric regression: Efficiency gains from modeling the nonparametric part
Yu, Kyusang; Park, Byeong U; 10.3150/10-BEJ296
2011-01-01
It is widely admitted that structured nonparametric modeling that circumvents the curse of dimensionality is important in nonparametric estimation. In this paper we show that the same holds for semi-parametric estimation. We argue that estimation of the parametric component of a semi-parametric model can be improved essentially when more structure is put into the nonparametric part of the model. We illustrate this for the partially linear model, and investigate efficiency gains when the nonparametric part of the model has an additive structure. We present the semi-parametric Fisher information bound for estimating the parametric part of the partially linear additive model and provide semi-parametric efficient estimators for which we use a smooth backfitting technique to deal with the additive nonparametric part. We also present the finite sample performances of the proposed estimators and analyze Boston housing data as an illustration.
A Bayesian nonparametric method for prediction in EST analysis
Directory of Open Access Journals (Sweden)
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Digital spectral analysis parametric, non-parametric and advanced methods
Castanié, Francis
2013-01-01
Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...
Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods
Directory of Open Access Journals (Sweden)
Pedro Carvalho
2016-03-01
Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Directory of Open Access Journals (Sweden)
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Regression methods for medical research
Tai, Bee Choo
2013-01-01
Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
A Level Set Analysis and A Nonparametric Regression on S&P 500 Daily Return
Directory of Open Access Journals (Sweden)
Yipeng Yang
2016-02-01
Full Text Available In this paper, a level set analysis is proposed which aims to analyze the S&P 500 return with a certain magnitude. It is found that the process of large jumps/drops of return tend to have negative serial correlation, and volatility clustering phenomenon can be easily seen. Then, a nonparametric analysis is performed and new patterns are discovered. An ARCH model is constructed based on the patterns we discovered and it is capable of manifesting the volatility skew in option pricing. A comparison of our model with the GARCH(1,1 model is carried out. The explanation of the validity on our model through prospect theory is provided, and, as a novelty, we linked the volatility skew phenomenon to the prospect theory in behavioral finance.
Non-parametric and least squares Langley plot methods
Directory of Open Access Journals (Sweden)
P. W. Kiedron
2015-04-01
Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Nonparametric methods for drought severity estimation at ungauged sites
Sadri, S.; Burn, D. H.
2012-12-01
The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.
An adaptive regression method for infrared blind-pixel compensation
Chen, Suting; Meng, Hao; Pei, Tao; Zhang, Yanyan
2017-09-01
Blind pixel compensation is an ill-posed inverse problem of infrared imaging systems and image restoration. The performance of a blind pixel compensation algorithm depends on the accuracy of estimation for the underlying true infrared images. We propose an adaptive regression method (ARM) for blind pixel compensation that integrates the multi-scale framework with a regression model. A blind-pixel is restored by exploiting the intra-scale properties through the nonparametric regressive estimation and the inter-scale characteristics via parametric regression for continuous learning. Combining the respective strengths of a parametric model and a nonparametric model, ARM establishes a set of multi-scale blind-pixel compensation method to correct the non-uniformity based on key frame extraction. Therefore, it is essentially different from the traditional frameworks for blind pixel compensation which are based on filtering and interpolation. Experimental results on some challenging cases of blind compensation show that the proposed algorithm outperforms existing methods by a significant margin in both isolated blind restoration and clustered blind restoration.
Institute of Scientific and Technical Information of China (English)
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Du, Li; Turner, Jay
2015-10-01
A long term air quality study is being conducted in Roxana, Illinois, USA, at the fenceline of a petroleum refinery. Measurements include 1-in-6 day 24-hour integrated ambient fine particulate matter (PM2.5) speciation following the Chemical Speciation Network (CSN) sampling and analysis protocols. Lanthanoid elements, some of which are tracers of fluidized-bed catalytic cracker (FCC) emissions, are also measured by inductively coupled plasma-mass spectrometry (ICP-MS) after extraction from PM2.5 using hot block-assisted acid digestion. Lanthanoid recoveries of 80-90% were obtained for two ambient particulate matter standard reference materials (NIST SRM 1648a and 2783). Ambient PM2.5 La patterns could be explained by a two-source model representing resuspended soil and FCC emissions with enhanced La/Ce ratios when impacted by the refinery. Nonparametric wind regression demonstrates that when the monitoring station was upwind of the refinery the mean La/Ce ratio is consistent with soil and when the monitoring station is downwind of the refinery the mean ratio is more than four times higher for bearings that corresponds to maximum impacts. Source apportionment modeling using EPA UNMIX and EPA PMF could not reliably apportion PM2.5 mass to the FCC emissions. However, the weight of evidence is that such contributions are small with no large episodes observed for the 164 samples analyzed. This study demonstrates the applicability of a hot block-assisted digestion protocol for the extraction of lanthanoid elements as well as insights obtained from long-term monitoring data including wind direction-based analyses.
Synthesizing Regression Results: A Factored Likelihood Method
Wu, Meng-Jia; Becker, Betsy Jane
2013-01-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…
Trend Analysis of Golestan's Rivers Discharges Using Parametric and Non-parametric Methods
Mosaedi, Abolfazl; Kouhestani, Nasrin
2010-05-01
One of the major problems in human life is climate changes and its problems. Climate changes will cause changes in rivers discharges. The aim of this research is to investigate the trend analysis of seasonal and yearly rivers discharges of Golestan province (Iran). In this research four trend analysis method including, conjunction point, linear regression, Wald-Wolfowitz and Mann-Kendall, for analyzing of river discharges in seasonal and annual periods in significant level of 95% and 99% were applied. First, daily discharge data of 12 hydrometrics stations with a length of 42 years (1965-2007) were selected, after some common statistical tests such as, homogeneity test (by applying G-B and M-W tests), the four mentioned trends analysis tests were applied. Results show that in all stations, for summer data time series, there are decreasing trends with a significant level of 99% according to Mann-Kendall (M-K) test. For autumn time series data, all four methods have similar results. For other periods, the results of these four tests were more or less similar together. While, for some stations the results of tests were different. Keywords: Trend Analysis, Discharge, Non-parametric methods, Wald-Wolfowitz, The Mann-Kendall test, Golestan Province.
Comparison of reliability techniques of parametric and non-parametric method
Directory of Open Access Journals (Sweden)
C. Kalaiselvan
2016-06-01
Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.
Parametric and Non-Parametric System Modelling
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...
Directory of Open Access Journals (Sweden)
SIAVASH KALBI
2014-05-01
Full Text Available Kalbi S, Fallah A, Hojjati SM. 2014. Using and comparing two nonparametric methods (CART and RF and SPOT-HRG satellite data to predictive tree diversity distribution. Nusantara Bioscience 6: 57-62. The prediction of spatial distributions of tree species by means of survey data has recently been used for conservation planning. Numerous methods have been developed for building species habitat suitability models. The present study was carried out to find the possible proper relationships between tree species diversity indices and SPOT-HRG reflectance values in Hyrcanian forests, North of Iran. Two different modeling techniques, Classification and Regression Trees (CART and Random Forest (RF, were fitted to the data in order to find the most successfully model. Simpson, Shannon diversity and the reciprocal of Simpson indices were used for estimating tree diversity. After collecting terrestrial information on trees in the 100 samples, the tree diversity indices were calculated in each plot. RF with determinate coefficient and RMSE from 56.3 to 63.9 and RMSE from 0.15 to 0.84 has better results than CART algorithms with determinate coefficient 42.3 to 63.3 and RMSE from 0.188 to 0.88. Overall the results showed that the SPOT-HRG satellite data and nonparametric regression could be useful for estimating tree diversity in Hyrcanian forests, North of Iran.
Institute of Scientific and Technical Information of China (English)
赵文芝; 田铮; 夏志明
2009-01-01
A wavelet method of detection and estimation of change points in nonparametric regression models under random design is proposed.The confidence bound of our test is derived by using the test statistics based on empirical wavelet coefficients as obtained by wavelet transformation of the data which is observed with noise.Moreover,the consistence of the test is proved while the rate of convergence is given.The method turns out to be effective after being tested on simulated examples and applied to IBM stock market data.
A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.
Mircioiu, Constantin; Atkinson, Jeffrey
2017-05-10
A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and
Non-parametric change-point method for differential gene expression detection.
Directory of Open Access Journals (Sweden)
Yao Wang
Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.
Heteroscedastic regression analysis method for mixed data
Institute of Scientific and Technical Information of China (English)
FU Hui-min; YUE Xiao-rui
2011-01-01
The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.
Wishart, Justin Rory
2011-01-01
In this paper, a lower bound is determined in the minimax sense for change point estimators of the first derivative of a regression function in the fractional white noise model. Similar minimax results presented previously in the area focus on change points in the derivatives of a regression function in the white noise model or consider estimation of the regression function in the presence of correlated errors.
An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies.
Bhattacharya, Rabi; Lin, Lizhen
2010-12-01
We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F(-1) achieve the optimal rate O(N(-4/5)). Also, we compute the asymptotic distribution of the estimate ζ~p of the effective dosage ζ(p) = F(-1) (p) which is shown to have an optimally small asymptotic variance.
Nonparametric Econometrics: The np Package
Directory of Open Access Journals (Sweden)
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S
2005-10-01
In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.
A web application for evaluating Phase I methods using a non-parametric optimal benchmark.
Wages, Nolan A; Varhegyi, Nikole
2017-06-01
In evaluating the performance of Phase I dose-finding designs, simulation studies are typically conducted to assess how often a method correctly selects the true maximum tolerated dose under a set of assumed dose-toxicity curves. A necessary component of the evaluation process is to have some concept for how well a design can possibly perform. The notion of an upper bound on the accuracy of maximum tolerated dose selection is often omitted from the simulation study, and the aim of this work is to provide researchers with accessible software to quickly evaluate the operating characteristics of Phase I methods using a benchmark. The non-parametric optimal benchmark is a useful theoretical tool for simulations that can serve as an upper limit for the accuracy of maximum tolerated dose identification based on a binary toxicity endpoint. It offers researchers a sense of the plausibility of a Phase I method's operating characteristics in simulation. We have developed an R shiny web application for simulating the benchmark. The web application has the ability to quickly provide simulation results for the benchmark and requires no programming knowledge. The application is free to access and use on any device with an Internet browser. The application provides the percentage of correct selection of the maximum tolerated dose and an accuracy index, operating characteristics typically used in evaluating the accuracy of dose-finding designs. We hope this software will facilitate the use of the non-parametric optimal benchmark as an evaluation tool in dose-finding simulation.
Comparison of Parametric and Nonparametric Methods for Analyzing the Bias of a Numerical Model
Directory of Open Access Journals (Sweden)
Isaac Mugume
2016-01-01
Full Text Available Numerical models are presently applied in many fields for simulation and prediction, operation, or research. The output from these models normally has both systematic and random errors. The study compared January 2015 temperature data for Uganda as simulated using the Weather Research and Forecast model with actual observed station temperature data to analyze the bias using parametric (the root mean square error (RMSE, the mean absolute error (MAE, mean error (ME, skewness, and the bias easy estimate (BES and nonparametric (the sign test, STM methods. The RMSE normally overestimates the error compared to MAE. The RMSE and MAE are not sensitive to direction of bias. The ME gives both direction and magnitude of bias but can be distorted by extreme values while the BES is insensitive to extreme values. The STM is robust for giving the direction of bias; it is not sensitive to extreme values but it does not give the magnitude of bias. The graphical tools (such as time series and cumulative curves show the performance of the model with time. It is recommended to integrate parametric and nonparametric methods along with graphical methods for a comprehensive analysis of bias of a numerical model.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
DEFF Research Database (Denmark)
Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda
2016-01-01
Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...
A NEW DE-NOISING METHOD BASED ON 3-BAND WAVELET AND NONPARAMETRIC ADAPTIVE ESTIMATION
Institute of Scientific and Technical Information of China (English)
Li Li; Peng Yuhua; Yang Mingqiang; Xue Peijun
2007-01-01
Wavelet de-noising has been well known as an important method of signal de-noising.Recently,most of the research efforts about wavelet de-noising focus on how to select the threshold,where Donoho method is applied widely.Compared with traditional 2-band wavelet,3-band wavelet has advantages in many aspects.According to this theory,an adaptive signal de-noising method in 3-band wavelet domain based on nonparametric adaptive estimation is proposed.The experimental results show that in 3-band wavelet domain,the proposed method represents better characteristics than Donoho method in protecting detail and improving the signal-to-noise ratio of reconstruction signal.
Akhmadaliev, S Z; Ambrosini, G; Amorim, A; Anderson, K; Andrieux, M L; Aubert, Bernard; Augé, E; Badaud, F; Baisin, L; Barreiro, F; Battistoni, G; Bazan, A; Bazizi, K; Belymam, A; Benchekroun, D; Berglund, S R; Berset, J C; Blanchot, G; Bogush, A A; Bohm, C; Boldea, V; Bonivento, W; Bosman, M; Bouhemaid, N; Breton, D; Brette, P; Bromberg, C; Budagov, Yu A; Burdin, S V; Calôba, L P; Camarena, F; Camin, D V; Canton, B; Caprini, M; Carvalho, J; Casado, M P; Castillo, M V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Chadelas, R; Chalifour, M; Chekhtman, A; Chevalley, J L; Chirikov-Zorin, I E; Chlachidze, G; Citterio, M; Cleland, W E; Clément, C; Cobal, M; Cogswell, F; Colas, Jacques; Collot, J; Cologna, S; Constantinescu, S; Costa, G; Costanzo, D; Crouau, M; Daudon, F; David, J; David, M; Davidek, T; Dawson, J; De, K; de La Taille, C; Del Peso, J; Del Prete, T; de Saintignon, P; Di Girolamo, B; Dinkespiler, B; Dita, S; Dodd, J; Dolejsi, J; Dolezal, Z; Downing, R; Dugne, J J; Dzahini, D; Efthymiopoulos, I; Errede, D; Errede, S; Evans, H; Eynard, G; Fassi, F; Fassnacht, P; Ferrari, A; Ferrer, A; Flaminio, Vincenzo; Fournier, D; Fumagalli, G; Gallas, E; Gaspar, M; Giakoumopoulou, V; Gianotti, F; Gildemeister, O; Giokaris, N; Glagolev, V; Glebov, V Yu; Gomes, A; González, V; González de la Hoz, S; Grabskii, V; Graugès-Pous, E; Grenier, P; Hakopian, H H; Haney, M; Hébrard, C; Henriques, A; Hervás, L; Higón, E; Holmgren, Sven Olof; Hostachy, J Y; Hoummada, A; Huston, J; Imbault, D; Ivanyushenkov, Yu M; Jézéquel, S; Johansson, E K; Jon-And, K; Jones, R; Juste, A; Kakurin, S; Karyukhin, A N; Khokhlov, Yu A; Khubua, J I; Klioukhine, V I; Kolachev, G M; Kopikov, S V; Kostrikov, M E; Kozlov, V; Krivkova, P; Kukhtin, V V; Kulagin, M; Kulchitskii, Yu A; Kuzmin, M V; Labarga, L; Laborie, G; Lacour, D; Laforge, B; Lami, S; Lapin, V; Le Dortz, O; Lefebvre, M; Le Flour, T; Leitner, R; Leltchouk, M; Li, J; Liablin, M V; Linossier, O; Lissauer, D; Lobkowicz, F; Lokajícek, M; Lomakin, Yu F; López-Amengual, J M; Lund-Jensen, B; Maio, A; Makowiecki, D S; Malyukov, S N; Mandelli, L; Mansoulié, B; Mapelli, Livio P; Marin, C P; Marrocchesi, P S; Marroquim, F; Martin, P; Maslennikov, A L; Massol, N; Mataix, L; Mazzanti, M; Mazzoni, E; Merritt, F S; Michel, B; Miller, R; Minashvili, I A; Miralles, L; Mnatzakanian, E A; Monnier, E; Montarou, G; Mornacchi, Giuseppe; Moynot, M; Muanza, G S; Nayman, P; Némécek, S; Nessi, Marzio; Nicoleau, S; Niculescu, M; Noppe, J M; Onofre, A; Pallin, D; Pantea, D; Paoletti, R; Park, I C; Parrour, G; Parsons, J; Pereira, A; Perini, L; Perlas, J A; Perrodo, P; Pilcher, J E; Pinhão, J; Plothow-Besch, Hartmute; Poggioli, Luc; Poirot, S; Price, L; Protopopov, Yu; Proudfoot, J; Puzo, P; Radeka, V; Rahm, David Charles; Reinmuth, G; Renzoni, G; Rescia, S; Resconi, S; Richards, R; Richer, J P; Roda, C; Rodier, S; Roldán, J; Romance, J B; Romanov, V; Romero, P; Rossel, F; Rusakovitch, N A; Sala, P; Sanchis, E; Sanders, H; Santoni, C; Santos, J; Sauvage, D; Sauvage, G; Sawyer, L; Says, L P; Schaffer, A C; Schwemling, P; Schwindling, J; Seguin-Moreau, N; Seidl, W; Seixas, J M; Selldén, B; Seman, M; Semenov, A; Serin, L; Shaldaev, E; Shochet, M J; Sidorov, V; Silva, J; Simaitis, V J; Simion, S; Sissakian, A N; Snopkov, R; Söderqvist, J; Solodkov, A A; Soloviev, A; Soloviev, I V; Sonderegger, P; Soustruznik, K; Spanó, F; Spiwoks, R; Stanek, R; Starchenko, E A; Stavina, P; Stephens, R; Suk, M; Surkov, A; Sykora, I; Takai, H; Tang, F; Tardell, S; Tartarelli, F; Tas, P; Teiger, J; Thaler, J; Thion, J; Tikhonov, Yu A; Tisserant, S; Tokar, S; Topilin, N D; Trka, Z; Turcotte, M; Valkár, S; Varanda, M J; Vartapetian, A H; Vazeille, F; Vichou, I; Vinogradov, V; Vorozhtsov, S B; Vuillemin, V; White, A; Wielers, M; Wingerter-Seez, I; Wolters, H; Yamdagni, N; Yosef, C; Zaitsev, A; Zitoun, R; Zolnierowski, Y
2002-01-01
This paper discusses hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter (consisting of a lead-liquid argon electromagnetic part and an iron-scintillator hadronic part) in the framework of the nonparametrical method. The nonparametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to an easy use in a first level trigger. The reconstructed mean values of the hadron energies are within +or-1% of the true values and the fractional energy resolution is [(58+or-3)%/ square root E+(2.5+or-0.3)%](+)(1.7+or-0.2)/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74+or-0.04 and agrees with the prediction that e/h >1.66 for this electromagnetic calorimeter. Results of a study of the longitudinal hadronic shower development are also presented. The data have been taken in the H8 beam...
Macroeconomic Forecasting Using Penalized Regression Methods
Smeekes, Stephan; Wijler, Etiënne
2016-01-01
We study the suitability of lasso-type penalized regression techniques when applied to macroeconomic forecasting with high-dimensional datasets. We consider performance of the lasso-type methods when the true DGP is a factor model, contradicting the sparsity assumption underlying penalized regressio
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
A non-parametric method for correction of global radiation observations
DEFF Research Database (Denmark)
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
This paper presents a method for correction and alignment of global radiation observations based on information obtained from calculated global radiation, in the present study one-hour forecast of global radiation from a numerical weather prediction (NWP) model is used. Systematical errors detected...... in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...... University. The method can be useful for optimized use of solar radiation observations for forecasting, monitoring, and modeling of energy production and load which are affected by solar radiation....
Non-parametric method for separating domestic hot water heating spikes and space heating
DEFF Research Database (Denmark)
Bacher, Peder; de Saint-Aubain, Philip Anton; Christiansen, Lasse Engbo;
2016-01-01
In this paper a method for separating spikes from a noisy data series, where the data change and evolve over time, is presented. The method is applied on measurements of the total heat load for a single family house. It relies on the fact that the domestic hot water heating is a process generating...... short-lived spikes in the time series, while the space heating changes in slower patterns during the day dependent on the climate and user behavior. The challenge is to separate the domestic hot water heating spikes from the space heating without affecting the natural noise in the space heating...... measurements. The assumption behind the developed method is that the space heating can be estimated by a non-parametric kernel smoother, such that every value significantly above this kernel smoother estimate is identified as a domestic hot water heating spike. First, it is showed how a basic kernel smoothing...
Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka
2016-10-01
In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased.
Xu, Zhiqiang
2017-02-16
Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.
Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi
2014-04-01
Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
Directory of Open Access Journals (Sweden)
Silvia Rizzi
2016-05-01
Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.
Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G; Lado, María J
2009-01-30
In many biomedical applications, interest lies in being able to distinguish between two possible states of a given response variable, depending on the values of certain continuous predictors. If the number of predictors, p, is high, or if there is redundancy among them, it then becomes important to decide on the selection of the best subset of predictors that will be able to obtain the models with greatest discrimination capacity. With this aim in mind, logistic generalized additive models were considered and receiver operating characteristic (ROC) curves were applied in order to determine and compare the discriminatory capacity of such models. This study sought to develop bootstrap-based tests that allow for the following to be ascertained: (a) the optimal number q < or = p of predictors; and (b) the model or models including q predictors, which display the largest AUC (area under the ROC curve). A simulation study was conducted to verify the behaviour of these tests. Finally, the proposed method was applied to a computer-aided diagnostic system dedicated to early detection of breast cancer. Copyright (c) 2008 John Wiley & Sons, Ltd.
Neelakantan, S; Veng-Pedersen, P
2005-11-01
A novel numerical deconvolution method is presented that enables the estimation of drug absorption rates under time-variant disposition conditions. The method involves two components. (1) A disposition decomposition-recomposition (DDR) enabling exact changes in the unit impulse response (UIR) to be constructed based on centrally based clearance changes iteratively determined. (2) A non-parametric, end-constrained cubic spline (ECS) input response function estimated by cross-validation. The proposed DDR-ECS method compensates for disposition changes between the test and the reference administrations by using a "beta" clearance correction based on DDR analysis. The representation of the input response by the ECS method takes into consideration the complex absorption process and also ensures physiologically realistic approximations of the response. The stability of the new method to noisy data was evaluated by comprehensive simulations that considered different UIRs, various input functions, clearance changes and a novel scaling of the input function that includes the "flip-flop" absorption phenomena. The simulated input response was also analysed by two other methods and all three methods were compared for their relative performances. The DDR-ECS method provides better estimation of the input profile under significant clearance changes but tends to overestimate the input when there were only small changes in the clearance.
Connolly, Brian; Cohen, K Bretonnel; Santel, Daniel; Bayram, Ulya; Pestian, John
2017-08-07
Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.
Nonparametric Comparison of Two Dynamic Parameter Setting Methods in a Meta-Heuristic Approach
Directory of Open Access Journals (Sweden)
Seyhun HEPDOGAN
2007-10-01
Full Text Available Meta-heuristics are commonly used to solve combinatorial problems in practice. Many approaches provide very good quality solutions in a short amount of computational time; however most meta-heuristics use parameters to tune the performance of the meta-heuristic for particular problems and the selection of these parameters before solving the problem can require much time. This paper investigates the problem of setting parameters using a typical meta-heuristic called Meta-RaPS (Metaheuristic for Randomized Priority Search.. Meta-RaPS is a promising meta-heuristic optimization method that has been applied to different types of combinatorial optimization problems and achieved very good performance compared to other meta-heuristic techniques. To solve a combinatorial problem, Meta-RaPS uses two well-defined stages at each iteration: construction and local search. After a number of iterations, the best solution is reported. Meta-RaPS performance depends on the fine tuning of two main parameters, priority percentage and restriction percentage, which are used during the construction stage. This paper presents two different dynamic parameter setting methods for Meta-RaPS. These dynamic parameter setting approaches tune the parameters while a solution is being found. To compare these two approaches, nonparametric statistic approaches are utilized since the solutions are not normally distributed. Results from both these dynamic parameter setting methods are reported.
Non-parametric method for measuring gas inhomogeneities from X-ray observations of galaxy clusters
Morandi, Andrea; Cui, Wei
2013-01-01
We present a non-parametric method to measure inhomogeneities in the intracluster medium (ICM) from X-ray observations of galaxy clusters. Analyzing mock Chandra X-ray observations of simulated clusters, we show that our new method enables the accurate recovery of the 3D gas density and gas clumping factor profiles out to large radii of galaxy clusters. We then apply this method to Chandra X-ray observations of Abell 1835 and present the first determination of the gas clumping factor from the X-ray cluster data. We find that the gas clumping factor in Abell 1835 increases with radius and reaches ~2-3 at r=R_{200}. This is in good agreement with the predictions of hydrodynamical simulations, but it is significantly below the values inferred from recent Suzaku observations. We further show that the radially increasing gas clumping factor causes flattening of the derived entropy profile of the ICM and affects physical interpretation of the cluster gas structure, especially at the large cluster-centric radii. Our...
How to use linear regression and correlation in quantitative method comparison studies.
Twomey, P J; Kroll, M H
2008-04-01
Linear regression methods try to determine the best linear relationship between data points while correlation coefficients assess the association (as opposed to agreement) between the two methods. Linear regression and correlation play an important part in the interpretation of quantitative method comparison studies. Their major strength is that they are widely known and as a result both are employed in the vast majority of method comparison studies. While previously performed by hand, the availability of statistical packages means that regression analysis is usually performed by software packages including MS Excel, with or without the software programe Analyze-it as well as by other software packages. Such techniques need to be employed in a way that compares the agreement between the two methods examined and more importantly, because we are dealing with individual patients, whether the degree of agreement is clinically acceptable. Despite their use for many years, there is a lot of ignorance about the validity as well as the pros and cons of linear regression and correlation techniques. This review article describes the types of linear regression and regression (parametric and non-parametric methods) and the necessary general and specific requirements. The selection of the type of regression depends on where one has been trained, the tradition of the laboratory and the availability of adequate software.
Comparison of three nonparametric kriging methods for delineating heavy-metal contaminated soils
Energy Technology Data Exchange (ETDEWEB)
Juang, K.W.; Lee, D.Y
2000-02-01
The probability of pollutant concentrations greater than a cutoff value is useful for delineating hazardous areas in contaminated soils. It is essential for risk assessment and reclamation. In this study, three nonparametric kriging methods [indicator kriging, probability kriging, and kriging with the cumulative distribution function (CDF) of order statistics (CDF kriging)] were used to estimate the probability of heavy-metal concentrations lower than a cutoff value. In terms of methodology, the probability kriging estimator and CDF kriging estimator take into account the information of the order relation, which is not considered in indicator kriging. Since probability kriging has been shown to be better than indicator kriging for delineating contaminated soils, the performance of CDF kriging, which the authors propose, was compared with that of probability kriging in this study. A data set of soil Cd and Pb concentrations obtained from a 10-ha heavy-metal contaminated site in Taoyuan, Taiwan, was used. The results demonstrated that the probability kriging and CDF kriging estimations were more accurate than the indicator kriging estimation. On the other hand, because the probability kriging was based on the cokriging estimator, some unreliable estimates occurred in the probability kriging estimation. This indicated that probability kriging was not as robust as CDF kriging. Therefore, CDF kriging is more suitable than probability kriging for estimating the probability of heavy-metal concentrations lower than a cutoff value.
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
Directory of Open Access Journals (Sweden)
Jinn-Min Yang
2016-11-01
Full Text Available Feature extraction (FE or dimensionality reduction (DR plays quite an important role in the field of pattern recognition. Feature extraction aims to reduce the dimensionality of the high-dimensional dataset to enhance the classification accuracy and foster the classification speed, particularly when the training sample size is small, namely the small sample size (SSS problem. Remotely sensed hyperspectral images (HSIs are often with hundreds of measured features (bands which potentially provides more accurate and detailed information for classification, but it generally needs more samples to estimate parameters to achieve a satisfactory result. The cost of collecting ground-truth of remotely sensed hyperspectral scene can be considerably difficult and expensive. Therefore, FE techniques have been an important part for hyperspectral image classification. Unlike lots of feature extraction methods are based only on the spectral (band information of the training samples, some feature extraction methods integrating both spatial and spectral information of training samples show more effective results in recent years. Spatial contexture information has been proven to be useful to improve the HSI data representation and to increase classification accuracy. In this paper, we propose a spatial and spectral nonparametric linear feature extraction method for hyperspectral image classification. The spatial and spectral information is extracted for each training sample and used to design the within-class and between-class scatter matrices for constructing the feature extraction model. The experimental results on one benchmark hyperspectral image demonstrate that the proposed method obtains stable and satisfactory results than some existing spectral-based feature extraction.
Predicting students’ grades using fuzzy non-parametric regression method and ReliefF-based algorithm
Directory of Open Access Journals (Sweden)
Javad Ghasemian
Full Text Available In this paper we introduce two new approaches to predict the grades that university students will acquire in the final exam of a course and improve the obtained result on some features extracted from logged data in an educational web-based system. First w ...
Johnson, H.O.; Gupta, S.C.; Vecchia, A.V.; Zvomuya, F.
2009-01-01
Excessive loading of sediment and nutrients to rivers is a major problem in many parts of the United States. In this study, we tested the non-parametric Seasonal Kendall (SEAKEN) trend model and the parametric USGS Quality of Water trend program (QWTREND) to quantify trends in water quality of the Minnesota River at Fort Snelling from 1976 to 2003. Both methods indicated decreasing trends in flow-adjusted concentrations of total suspended solids (TSS), total phosphorus (TP), and orthophosphorus (OP) and a generally increasing trend in flow-adjusted nitrate plus nitrite-nitrogen (NO3-N) concentration. The SEAKEN results were strongly influenced by the length of the record as well as extreme years (dry or wet) earlier in the record. The QWTREND results, though influenced somewhat by the same factors, were more stable. The magnitudes of trends between the two methods were somewhat different and appeared to be associated with conceptual differences between the flow-adjustment processes used and with data processing methods. The decreasing trends in TSS, TP, and OP concentrations are likely related to conservation measures implemented in the basin. However, dilution effects from wet climate or additional tile drainage cannot be ruled out. The increasing trend in NO3-N concentrations was likely due to increased drainage in the basin. Since the Minnesota River is the main source of sediments to the Mississippi River, this study also addressed the rapid filling of Lake Pepin on the Mississippi River and found the likely cause to be increased flow due to recent wet climate in the region. Copyright ?? 2009 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
Directory of Open Access Journals (Sweden)
Andrea Furková
2007-06-01
Full Text Available This paper explores the aplication of parametric and non-parametric benchmarking methods in measuring cost efficiency of Slovak and Czech electricity distribution companies. We compare the relative cost efficiency of Slovak and Czech distribution companies using two benchmarking methods: the non-parametric Data Envelopment Analysis (DEA and the Stochastic Frontier Analysis (SFA as the parametric approach. The first part of analysis was based on DEA models. Traditional cross-section CCR and BCC model were modified to cost efficiency estimation. In further analysis we focus on two versions of stochastic frontier cost functioin using panel data: MLE model and GLS model. These models have been applied to an unbalanced panel of 11 (Slovakia 3 and Czech Republic 8 regional electricity distribution utilities over a period from 2000 to 2004. The differences in estimated scores, parameters and ranking of utilities were analyzed. We observed significant differences between parametric methods and DEA approach.
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
DEFF Research Database (Denmark)
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
MIRELA SECARĂ
2008-01-01
Tourism represents an important field of economic and social life in our country, and the main sector of the economy of Constanta County is the balneary touristic capitalization of Romanian seaside. In order to statistically analyze hydro tourism on Romanian seaside, we have applied non-parametric methods of measuring and interpretation of existing statistic connections within seaside hydro tourism. Major objective of this research is represented by hydro tourism re-establishment on Romanian ...
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
非参数回归中方差变点的小波检测%Detection of Change Points in Volatility of Non-Parametric Regression by Wavelets
Institute of Scientific and Technical Information of China (English)
王景乐; 郑明
2012-01-01
This paper studies the detection and estimation of change points in volatility under nonparametric regression models.Wavelet methods are applied to construct the test statistics which can be used to detect change points in volatility.The asymptotic distributions of the test statistics are established.We also utilize the test statistics to construct the estimators for the locations and jump sizes of the change points in volatility.The asymptotic properties of these estimators are derived.Some simulation studies are conducted to assess the finite sample performance of the proposed procedures.%本文主要研究了非参数回归模型中方差函数的变点,利用小波方法构造的检验量来检测方差中的变点,建立了这些检验量的渐近分布,并且运用这些检验量构造了方差变点的位置和跳跃幅度的估计,给出了这些估计的渐近性质,并进一步通过随机模拟验证了本文方法在有限样本下的性质.
TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION INDICES OF 35 PYRAZINES
Directory of Open Access Journals (Sweden)
Fatiha Mebarki
2016-02-01
Full Text Available The study treated two closer alternative methods of which the principal characteristic: a non-parametric method (the least absolute deviation (LAD and a traditional method of diagnosis OLS.This was applied to model, separately, the indices of retention of the same whole of 35 pyrazines (27 pyrazines with 8 other pyrazines in the same unit eluted to the columns OV-101 and Carbowax-20M, by using theoretical molecular descriptors calculated using the software DRAGON. The detection of influential observations for non-parametric method (LAD is a problem which has been extensively studied and offers alternative dicapproaches whose main feature is the robustness.here is presented and compared with the standard least squares regression .The comparison between methods LAD and OLS is based on the equation of the hyperplane, in order to confirm the robustness thus to detect by the meaningless statements and the points of lever and validated results in the state approached by the tests statistics: Test of Anderson-Darling, shapiro-wilk, Agostino, Jarque-Bera, graphic test (histogram of frequency and the confidence interval thanks to the concept of robustness to check if the distribution of the errors is really approximate.
Relationship between Multiple Regression and Selected Multivariable Methods.
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Institute of Scientific and Technical Information of China (English)
赵文芝; 夏志明; 贺飞跃
2016-01-01
The two-step estimators for change point in nonparametric regression are proposed.In the first step,an initial estimator is obtained by local linear smoothing method.In the second step,the fi-nal estimator is obtained by CUSUM method on a closed neighborhood of initial estimator.It is found through a simulation study that the proposed estimator is efficient.The estimator for j ump size is also obtained.Further more,experimental results that using historical data on Nile river discharges,ex-change rate data of USD against RMB and global temperature data for the northern hemisphere show that the proposed method is also practical in applications.%针对非参数回归模型变点问题，给出了变点的两步估计方法。第一步，用局部线性方法给出变点的初始估计量；第二步，在初始估计量的邻域内，用 CUSUM方法给出变点的最终估计量，同时获得了变点跃度的估计量。蒙特卡罗随机模拟结果表明了此方法的有效性。最后以尼罗河流量数据，美元兑换人民币汇率数据以及北半球月平均气温数据为例进行分析，结果说明此方法有实际应用价值。
Comparison of Regularized Regression Methods for ~Omics Data
Acharjee, A.; Finkers, H.J.; Visser, R.G.F.; Maliepaard, C.A.
2013-01-01
Background: In this study, we compare methods that can be used to relate a phenotypic trait of interest to an ~omics data set, where the number or variables outnumbers by far the number of samples. Methods: We apply univariate regression and different regularized multiple regression methods: ridge r
Calculation of Solar Radiation by Using Regression Methods
Kızıltan, Ö.; Şahin, M.
2016-04-01
In this study, solar radiation was estimated at 53 location over Turkey with varying climatic conditions using the Linear, Ridge, Lasso, Smoother, Partial least, KNN and Gaussian process regression methods. The data of 2002 and 2003 years were used to obtain regression coefficients of relevant methods. The coefficients were obtained based on the input parameters. Input parameters were month, altitude, latitude, longitude and landsurface temperature (LST).The values for LST were obtained from the data of the National Oceanic and Atmospheric Administration Advanced Very High Resolution Radiometer (NOAA-AVHRR) satellite. Solar radiation was calculated using obtained coefficients in regression methods for 2004 year. The results were compared statistically. The most successful method was Gaussian process regression method. The most unsuccessful method was lasso regression method. While means bias error (MBE) value of Gaussian process regression method was 0,274 MJ/m2, root mean square error (RMSE) value of method was calculated as 2,260 MJ/m2. The correlation coefficient of related method was calculated as 0,941. Statistical results are consistent with the literature. Used the Gaussian process regression method is recommended for other studies.
Structuring feature space: a non-parametric method for volumetric transfer function generation.
Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S
2009-01-01
The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation
Directory of Open Access Journals (Sweden)
Sharad Damodar Gore
2009-10-01
Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.
Robust regression methods for real-time polymerase chain reaction
Trypsteen, Wim; De Neve, Jan; Bosman, Kobus; Nijhuis, Monique; Thas, Olivier; Vandekerckhove, Linos; De Spiegelaere, Ward
2015-01-01
Current real-time polymerase chain reaction (PCR) data analysis methods implement linear least squares regression methods for primer efficiency estimation based on standard curve dilution series. This method is sensitive to outliers that distort the outcome and are often ignored or removed by the en
Robust regression methods for real-time polymerase chain reaction
Trypsteen, Wim; De Neve, Jan; Bosman, Kobus; Nijhuis, Monique; Thas, Olivier; Vandekerckhove, Linos; De Spiegelaere, Ward
2015-01-01
Current real-time polymerase chain reaction (PCR) data analysis methods implement linear least squares regression methods for primer efficiency estimation based on standard curve dilution series. This method is sensitive to outliers that distort the outcome and are often ignored or removed by the
Robust regression methods for real-time polymerase chain reaction
Trypsteen, Wim; De Neve, Jan; Bosman, Kobus; Nijhuis, Monique|info:eu-repo/dai/nl/176957529; Thas, Olivier; Vandekerckhove, Linos; De Spiegelaere, Ward
2015-01-01
Current real-time polymerase chain reaction (PCR) data analysis methods implement linear least squares regression methods for primer efficiency estimation based on standard curve dilution series. This method is sensitive to outliers that distort the outcome and are often ignored or removed by the en
On a method of perfect regression using sinusoidal expansion
Sinha, Nilotpal Kanti
2011-01-01
We present a new method of weighted least square regression that gives a curve of fit with any desired degree of accuracy for a given set of data points. By applying this iterative process infinitely, we show that every finite set of coplanar points can be expanded as a sinusoidal series in infinitely many ways. Thus, given any set of finite data points, we can obtain infinitely many perfect regression curves which give a perfect match between the given data points and the values given by the regression.
随机右删失非参数回归模型的影响分析%Influence Analysis of Non-parametric Regression Model with Random Right Censorship
Institute of Scientific and Technical Information of China (English)
王淑玲; 冯予; 刘刚
2012-01-01
In this paper, the primary model is transformed to non-parametric regression model; Then, local influence is discussed and concise influence matrix is obtained; At last, example is given to illustrate our results.%将随机删失非参数固定设计回归模型转化为非参数回归模型进行研究；然后对此模型作了局部影响分析,得到计算影响矩阵及最大影响曲率方向的简洁公式；最后通过实例分析,验证了分析方法的有效性.
Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva
2017-06-01
Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.
A class of classification and regression methods by multiobjective programming
Institute of Scientific and Technical Information of China (English)
Dongling ZHANG; Yong SHI; Yingjie TIAN; Meihong ZHU
2009-01-01
An extensive review for the recent developments of multiple criteria linear programming data mining mod-els is provided in this paper. These researches, which in-clude classification and regression methods, are introduced in a systematic way. Some applications of these methods to real-world problems are also involved in this paper. This paper is a summary and reference of multiple criteria linear programming methods that might be helpful for researchers and applications in data mining.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
Kh., S Rezaei; Hanson, R J; Fouesneau, M
2016-01-01
We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line-of-sight. Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian Process to connect all lines-of-sight. We demonstrate the capability of our model to capture detailed dust density variations using mock data as well as simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Due to our smoothness constraint and its isotropy,...
Comparison of non-parametric methods for ungrouping coarsely aggregated data
DEFF Research Database (Denmark)
Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda;
2016-01-01
group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five...
Energy Technology Data Exchange (ETDEWEB)
Pekney, Natalie J.; Cheng, Hanqi; Small, Mitchell J.
2015-11-05
Abstract: The objective of the current work was to develop a statistical method and associated tool to evaluate the impact of oil and natural gas exploration and production activities on local air quality.
DEFF Research Database (Denmark)
Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann;
2012-01-01
Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... this investigation, we propose the optimal solution for regression estimation in case of noisy and inconsistent optical measurements, which is the case in many practical measurement systems. The principal component regression (PLS), partial least squares (PCR) and least angle regression (LAR) methods are compared...
Diametral creep prediction of pressure tube using statistical regression methods
Energy Technology Data Exchange (ETDEWEB)
Kim, D. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of); Lee, J.Y. [Korea Electric Power Research Inst., Daejeon (Korea, Republic of); Na, M.G. [Chosun Univ., Gwangju (Korea, Republic of); Jang, C. [Korea Advanced Inst. of Science and Technology, Daejeon (Korea, Republic of)
2010-07-01
Diametral creep prediction of pressure tube in CANDU reactor is an important factor for ROPT calculation. In this study, pressure tube diametral creep prediction models were developed using statistical regression method such as linear mixed model for longitudinal data analysis. Inspection and operating condition data of Wolsong unit 1 and 2 reactors were used. Serial correlation model and random coefficient model were developed for pressure tube diameter prediction. Random coefficient model provided more accurate results than serial correlation model. (author)
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Semi- and Nonparametric ARCH Processes
Directory of Open Access Journals (Sweden)
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Fast nonlinear regression method for CT brain perfusion analysis.
Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M
2016-04-01
Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
Fan, Jianqing; Song, Rui
2011-01-01
A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Parametrically guided estimation in nonparametric varying coefficient models with quasi-likelihood.
Davenport, Clemontina A; Maity, Arnab; Wu, Yichao
2015-04-01
Varying coefficient models allow us to generalize standard linear regression models to incorporate complex covariate effects by modeling the regression coefficients as functions of another covariate. For nonparametric varying coefficients, we can borrow the idea of parametrically guided estimation to improve asymptotic bias. In this paper, we develop a guided estimation procedure for the nonparametric varying coefficient models. Asymptotic properties are established for the guided estimators and a method of bandwidth selection via bias-variance tradeoff is proposed. We compare the performance of the guided estimator with that of the unguided estimator via both simulation and real data examples.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
无
2009-01-01
Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
WANG ZhanFeng; WU YaoHua; ZHAO LinCheng
2009-01-01
Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Polygraph Test Results Assessment by Regression Analysis Methods
Directory of Open Access Journals (Sweden)
K. A. Leontiev
2014-01-01
Full Text Available The paper considers a problem of defining the importance of asked questions for the examinee under judicial and psychophysiological polygraph examination by methods of mathematical statistics. It offers the classification algorithm based on the logistic regression as an optimum Bayesian classifier, considering weight coefficients of information for the polygraph-recorded physiological parameters with no condition for independence of the measured signs.Actually, binary classification is executed by results of polygraph examination with preliminary normalization and standardization of primary results, with check of a hypothesis that distribution of obtained data is normal, as well as with calculation of coefficients of linear regression between input values and responses by method of maximum likelihood. Further, the logistic curve divided signs into two classes of the "significant" and "insignificant" type.Efficiency of model is estimated by means of the ROC analysis (Receiver Operator Characteristics. It is shown that necessary minimum sample has to contain results of 45 measurements at least. This approach ensures a reliable result provided that an expert-polygraphologist possesses sufficient qualification and follows testing techniques.
Energy Technology Data Exchange (ETDEWEB)
Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu
2008-03-07
We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.
Nonparametric estimation of ultrasound pulses
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
Dimension Reduction and Discretization in Stochastic Problems by Regression Method
DEFF Research Database (Denmark)
Ditlevsen, Ove Dalager
1996-01-01
The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ......, Slepian models, Stochastic finite elements.......The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation...
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Directory of Open Access Journals (Sweden)
Andreas H. Melcher
2012-09-01
Full Text Available This study analyses multidimensional spawning habitat suitability of the fish species “Nase” (latin: Chondrostoma nasus. This is the first time non-parametric methods were used to better understand biotic habitat use in theory and practice. In particular, we tested (1 the Decision Tree technique, Chi-squared Automatic Interaction Detectors (CHAID, to identify specific habitat types and (2 Prediction-Configural Frequency Analysis (P-CFA to test for statistical significance. The combination of both non-parametric methods, CHAID and P-CFA, enabled the identification, prediction and interpretation of most typical significant spawning habitats, and we were also able to determine non-typical habitat types, e.g., types in contrast to antitypes. The gradual combination of these two methods underlined three significant habitat types: shaded habitat, fine and coarse substrate habitat depending on high flow velocity. The study affirmed the importance for fish species of shading and riparian vegetation along river banks. In addition, this method provides a weighting of interactions between specific habitat characteristics. The results demonstrate that efficient river restoration requires re-establishing riparian vegetation as well as the open river continuum and hydro-morphological improvements to habitats.
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Analysis of some methods for reduced rank Gaussian process regression
DEFF Research Database (Denmark)
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Pérez, Hector E; Kettner, Keith
2013-10-01
Time-to-event analysis represents a collection of relatively new, flexible, and robust statistical techniques for investigating the incidence and timing of transitions from one discrete condition to another. Plant biology is replete with examples of such transitions occurring from the cellular to population levels. However, application of these statistical methods has been rare in botanical research. Here, we demonstrate the use of non- and semi-parametric time-to-event and categorical data analyses to address questions regarding seed to seedling transitions of Ipomopsis rubra propagules exposed to various doses of constant or simulated seasonal diel temperatures. Seeds were capable of germinating rapidly to >90 % at 15-25 or 22/11-29/19 °C. Optimum temperatures for germination occurred at 25 or 29/19 °C. Germination was inhibited and seed viability decreased at temperatures ≥30 or 33/24 °C. Kaplan-Meier estimates of survivor functions indicated highly significant differences in temporal germination patterns for seeds exposed to fluctuating or constant temperatures. Extended Cox regression models specified an inverse relationship between temperature and the hazard of germination. Moreover, temperature and the temperature × day interaction had significant effects on germination response. Comparisons to reference temperatures and linear contrasts suggest that summer temperatures (33/24 °C) play a significant role in differential germination responses. Similarly, simple and complex comparisons revealed that the effects of elevated temperatures predominate in terms of components of seed viability. In summary, the application of non- and semi-parametric analyses provides appropriate, powerful data analysis procedures to address various topics in seed biology and more widespread use is encouraged.
Analysis of some methods for reduced rank Gaussian process regression
DEFF Research Database (Denmark)
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Regression Methods for Virtual Metrology of Layer Thickness in Chemical Vapor Deposition
DEFF Research Database (Denmark)
Purwins, Hendrik; Barak, Bernd; Nagi, Ahmed
2014-01-01
predictive variable alone, the 3 most predictive variables, an expert selection, and full set. The following regression methods are compared: Simple Linear Regression, Multiple Linear Regression, Partial Least Square Regression, and Ridge Linear Regression utilizing the Partial Least Square Estimate......The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (Virtual Metrology). In this paper, a survey on regression methods is given to predict...... algorithm, and Support Vector Regression (SVR). On a test set, SVR outperforms the other methods by a large margin, being more robust towards changes in the production conditions. The method performs better on high-dimensional multivariate input data than on the most predictive variables alone. Process...
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Widely Linear Complex-Valued Kernel Methods for Regression
Boloix-Tortosa, Rafael; Murillo-Fuentes, Juan Jose; Santos, Irene; Perez-Cruz, Fernando
2017-10-01
Usually, complex-valued RKHS are presented as an straightforward application of the real-valued case. In this paper we prove that this procedure yields a limited solution for regression. We show that another kernel, here denoted as pseudo kernel, is needed to learn any function in complex-valued fields. Accordingly, we derive a novel RKHS to include it, the widely RKHS (WRKHS). When the pseudo-kernel cancels, WRKHS reduces to complex-valued RKHS of previous approaches. We address the kernel and pseudo-kernel design, paying attention to the kernel and the pseudo-kernel being complex-valued. In the experiments included we report remarkable improvements in simple scenarios where real a imaginary parts have different similitude relations for given inputs or cases where real and imaginary parts are correlated. In the context of these novel results we revisit the problem of non-linear channel equalization, to show that the WRKHS helps to design more efficient solutions.
A non-parametric framework for estimating threshold limit values
Directory of Open Access Journals (Sweden)
Ulm Kurt
2005-11-01
Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.
Forecasting Gold Prices Using Multiple Linear Regression Method
Directory of Open Access Journals (Sweden)
Z. Ismail
2009-01-01
Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as forecast-1 was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on a hunch of experts, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
Combining regression trees and radial basis function networks.
Orr, M; Hallam, J; Takezawa, K; Murra, A; Ninomiya, S; Oide, M; Leonard, T
2000-12-01
We describe a method for non-parametric regression which combines regression trees with radial basis function networks. The method is similar to that of Kubat, who was first to suggest such a combination, but has some significant improvements. We demonstrate the features of the new method, compare its performance with other methods on DELVE data sets and apply it to a real world problem involving the classification of soybean plants from digital images.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Rohée, E.; Coulon, R.; Carrel, F.; Dautremer, T.; Barat, E.; Montagu, T.; Normand, S.; Jammes, C.
2016-11-01
Radionuclide identification and quantification are a serious concern for many applications as for in situ monitoring at nuclear facilities, laboratory analysis, special nuclear materials detection, environmental monitoring, and waste measurements. High resolution gamma-ray spectrometry based on high purity germanium diode detectors is the best solution available for isotopic identification. Over the last decades, methods have been developed to improve gamma spectra analysis. However, some difficulties remain in the analysis when full energy peaks are folded together with high ratio between their amplitudes, and when the Compton background is much larger compared to the signal of a single peak. In this context, this study deals with the comparison between a conventional analysis based on "iterative peak fitting deconvolution" method and a "nonparametric Bayesian deconvolution" approach developed by the CEA LIST and implemented into the SINBAD code. The iterative peak fit deconvolution is used in this study as a reference method largely validated by industrial standards to unfold complex spectra from HPGe detectors. Complex cases of spectra are studied from IAEA benchmark protocol tests and with measured spectra. The SINBAD code shows promising deconvolution capabilities compared to the conventional method without any expert parameter fine tuning.
Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F
2002-06-01
Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Uh, Hae-Won; Hartgers, Franca C; Yazdanbakhsh, Maria; Houwing-Duistermaat, Jeanine J
2008-10-17
The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects), and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary) linear regression. The second aim is to compare these methods by simulation studies based on real data. We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best. Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
Afifah, Rawyanil; Andriyana, Yudhie; Jaya, I. G. N. Mindra
2017-03-01
Geographically Weighted Regression (GWR) is a development of an Ordinary Least Squares (OLS) regression which is quite effective in estimating spatial non-stationary data. On the GWR models, regression parameters are generated locally, each observation has a unique regression coefficient. Parameter estimation process in GWR uses Weighted Least Squares (WLS). But when there are outliers in the data, the parameter estimation process with WLS produces estimators which are not efficient. Hence, this study uses a robust method called Least Absolute Deviation (LAD), to estimate the parameters of GWR model in the case of poverty in Java Island. This study concludes that GWR model with LAD method has a better performance.
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
Fan, Jianqing; Feng, Yang; Song, Rui
2011-06-01
A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.
A comparison of various methods for multivariate regression with highly collinear variables
Kiers, Henk A.L.; Smilde, Age K.
2007-01-01
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The p
A comparison of various methods for multivariate regression with highly collinear variables
Kiers, Henk A.L.; Smilde, Age K.
2007-01-01
Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The p
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Biometric Authentication using Nonparametric Methods
Sheela, S V; 10.5121/ijcsit.2010.2309
2010-01-01
The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based o...
Biometric Authentication using Nonparametric Methods
Sheela, S V; 10.5121/ijcsit.2010.2309
2010-01-01
The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy k-means classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based ...
非参数判别模型%Nonparametric discriminant model
Institute of Scientific and Technical Information of China (English)
谢斌锋; 梁飞豹
2011-01-01
提出了一类新的判别分析方法,主要思想是将非参数回归模型推广到判别分析中,形成相应的非参数判别模型.通过实例与传统判别法相比较,表明非参数判别法具有更广泛的适用性和较高的回代正确率.%In this paper, the author puts forth a new class of discriminant method, which the main idea is applied non- parametric regression model to discriminant analysis and forms the corresponding nonparametric discriminant model. Compared with the traditional discriminant methods by citing an example, the nonparametric discriminant method has more comprehensive adaptability and higher correct rate of back subsitution.
Evaluation of regression methods when immunological measurements are constrained by detection limits
Directory of Open Access Journals (Sweden)
Yazdanbakhsh Maria
2008-10-01
Full Text Available Abstract Background The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects, and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary linear regression. The second aim is to compare these methods by simulation studies based on real data. Results We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best. Conclusion Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
Motulsky, Harvey J; Brown, Ronald E
2006-03-09
Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1-3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives.
Directory of Open Access Journals (Sweden)
Motulsky Harvey J
2006-03-01
Full Text Available Abstract Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives.
Directory of Open Access Journals (Sweden)
Stochl Jan
2012-06-01
Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis
Outlier Detection Method in Linear Regression Based on Sum of Arithmetic Progression
Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T.
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e − 2 to ±1.0e + 2 from the correct value. PMID:25121139
Radial basis function regression methods for predicting quantitative traits using SNP markers.
Long, Nanye; Gianola, Daniel; Rosa, Guilherme J M; Weigel, Kent A; Kranis, Andreas; González-Recio, Oscar
2010-06-01
A challenge when predicting total genetic values for complex quantitative traits is that an unknown number of quantitative trait loci may affect phenotypes via cryptic interactions. If markers are available, assuming that their effects on phenotypes are additive may lead to poor predictive ability. Non-parametric radial basis function (RBF) regression, which does not assume a particular form of the genotype-phenotype relationship, was investigated here by simulation and analysis of body weight and food conversion rate data in broilers. The simulation included a toy example in which an arbitrary non-linear genotype-phenotype relationship was assumed, and five different scenarios representing different broad sense heritability levels (0.1, 0.25, 0.5, 0.75 and 0.9) were created. In addition, a whole genome simulation was carried out, in which three different gene action modes (pure additive, additive+dominance and pure epistasis) were considered. In all analyses, a training set was used to fit the model and a testing set was used to evaluate predictive performance. The latter was measured by correlation and predictive mean-squared error (PMSE) on the testing data. For comparison, a linear additive model known as Bayes A was used as benchmark. Two RBF models with single nucleotide polymorphism (SNP)-specific (RBF I) and common (RBF II) weights were examined. Results indicated that, in the presence of complex genotype-phenotype relationships (i.e. non-linearity and non-additivity), RBF outperformed Bayes A in predicting total genetic values using SNP markers. Extension of Bayes A to include all additive, dominance and epistatic effects could improve its prediction accuracy. RBF I was generally better than RBF II, and was able to identify relevant SNPs in the toy example.
An NCME Instructional Module on Data Mining Methods for Classification and Regression
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
Directory of Open Access Journals (Sweden)
Mustafa Koroglu
2016-02-01
Full Text Available This paper considers a functional-coefficient spatial Durbin model with nonparametric spatial weights. Applying the series approximation method, we estimate the unknown functional coefficients and spatial weighting functions via a nonparametric two-stage least squares (or 2SLS estimation method. To further improve estimation accuracy, we also construct a second-step estimator of the unknown functional coefficients by a local linear regression approach. Some Monte Carlo simulation results are reported to assess the finite sample performance of our proposed estimators. We then apply the proposed model to re-examine national economic growth by augmenting the conventional Solow economic growth convergence model with unknown spatial interactive structures of the national economy, as well as country-specific Solow parameters, where the spatial weighting functions and Solow parameters are allowed to be a function of geographical distance and the countries’ openness to trade, respectively.
GIS-based logistic regression method for landslide susceptibility mapping in regional scale
Institute of Scientific and Technical Information of China (English)
ZHU Lei; HUANG Jing-feng
2006-01-01
Landslide susceptibility map is one of the study fields portraying the spatial distribution of future slope failure susceptibility. This paper deals with past methods for producing landslide susceptibility map and divides these methods into 3 types.The logistic linear regression approach is further elaborated on by crosstabs method, which is used to analyze the relationship between the categorical or binary response variable and one or more continuous or categorical or binary explanatory variables derived from samples. It is an objective assignment of coefficients serving as weights of various factors under considerations while expert opinions make great difference in heuristic approaches. Different from deterministic approach, it is very applicable to regional scale. In this study, double logistic regression is applied in the study area. The entire study area is first analyzed. The logistic regression equation showed that elevation, proximity to road, river and residential area are main factors triggering landslide occurrence in this area. The prediction accuracy of the first landslide susceptibility map was showed to be 80%. Along the road and residential area, almost all areas are in high landslide susceptibility zone. Some non-landslide areas are incorrectly divided into high and medium landslide susceptibility zone. In order to improve the status, a second logistic regression was done in high landslide susceptibility zone using landslide cells and non-landslide sample cells in this area. In the second logistic regression analysis, only engineering and geological conditions are important in these areas and are entered in the new logistic regression equation indicating that only areas with unstable engineering and geological conditions are prone to landslide during large scale engineerirg activity. Taking these two logistic regression results into account yields a new landslide susceptibility map. Double logistic regression analysis improved the non
Stochl, Jan; Jones, Peter B; Croudace, Tim J
2012-06-11
Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental
COMPARISON OF DIAGNOSTIC METHODS FOR DETECTING AN INFLUENTIAL OBSERVATION IN REGRESSION
ACARLAR, Irmak
2011-01-01
An influential observation and influential sets would cause noticeable differentiations on the fitted values in regression. Since these differentiations decrease explicable of model, detecting the influential observation or the influential sets in data is important for efficiency of regression analysis. In this study DFFITS, DFBETAS, COVRATIO, Cook Distance, S statistics and graphical technique used for detecting an influential observation are examined. These methods are compared with regard ...
Rawles, Christopher; Thurber, Clifford
2015-08-01
We present a simple, fast, and robust method for automatic detection of P- and S-wave arrivals using a nearest neighbours-based approach. The nearest neighbour algorithm is one of the most popular time-series classification methods in the data mining community and has been applied to time-series problems in many different domains. Specifically, our method is based on the non-parametric time-series classification method developed by Nikolov. Instead of building a model by estimating parameters from the data, the method uses the data itself to define the model. Potential phase arrivals are identified based on their similarity to a set of reference data consisting of positive and negative sets, where the positive set contains examples of analyst identified P- or S-wave onsets and the negative set contains examples that do not contain P waves or S waves. Similarity is defined as the square of the Euclidean distance between vectors representing the scaled absolute values of the amplitudes of the observed signal and a given reference example in time windows of the same length. For both P waves and S waves, a single pass is done through the bandpassed data, producing a score function defined as the ratio of the sum of similarity to positive examples over the sum of similarity to negative examples for each window. A phase arrival is chosen as the centre position of the window that maximizes the score function. The method is tested on two local earthquake data sets, consisting of 98 known events from the Parkfield region in central California and 32 known events from the Alpine Fault region on the South Island of New Zealand. For P-wave picks, using a reference set containing two picks from the Parkfield data set, 98 per cent of Parkfield and 94 per cent of Alpine Fault picks are determined within 0.1 s of the analyst pick. For S-wave picks, 94 per cent and 91 per cent of picks are determined within 0.2 s of the analyst picks for the Parkfield and Alpine Fault data set
Martens, Edwin P|info:eu-repo/dai/nl/088859010; de Boer, Anthonius|info:eu-repo/dai/nl/075097346; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H|info:eu-repo/dai/nl/181447649
PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort
A Maximum Likelihood Method for Latent Class Regression Involving a Censored Dependent Variable.
Jedidi, Kamel; And Others
1993-01-01
A method is proposed to simultaneously estimate regression functions and subject membership in "k" latent classes or groups given a censored dependent variable for a cross-section of subjects. Maximum likelihood estimates are obtained using an EM algorithm. The method is illustrated through a consumer psychology application. (SLD)
Directory of Open Access Journals (Sweden)
Ferger Dietmar
2009-09-01
Full Text Available Abstract Background Epidemiological and clinical studies, often including anthropometric measures, have established obesity as a major risk factor for the development of type 2 diabetes. Appropriate cut-off values for anthropometric parameters are necessary for prediction or decision purposes. The cut-off corresponding to the Youden-Index is often applied in epidemiology and biomedical literature for dichotomizing a continuous risk indicator. Methods Using data from a representative large multistage longitudinal epidemiological study in a primary care setting in Germany, this paper explores a novel approach for estimating optimal cut-offs of anthropomorphic parameters for predicting type 2 diabetes based on a discontinuity of a regression function in a nonparametric regression framework. Results The resulting cut-off corresponded to values obtained by the Youden Index (maximum of the sum of sensitivity and specificity, minus one, often considered the optimal cut-off in epidemiological and biomedical research. The nonparametric regression based estimator was compared to results obtained by the established methods of the Receiver Operating Characteristic plot in various simulation scenarios and based on bias and root mean square error, yielded excellent finite sample properties. Conclusion It is thus recommended that this nonparametric regression approach be considered as valuable alternative when a continuous indicator has to be dichotomized at the Youden Index for prediction or decision purposes.
Directory of Open Access Journals (Sweden)
Anwar Fitrianto
2014-01-01
Full Text Available When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performance than the other approaches.
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Based on the model structure of the influence coefficient method analyzed in depth by matrix theory,it is explained the reason why the unreasonable and instable correction masses with bigger MSE are obtained by LS influence coefficient method when there are correlation planes in the dynamic balancing. It also presened the new ridge regression method for solving correction masses according to the Tikhonov regularization theory, and described the reason why the ridge regression can eliminate the disadvantage of the LS method. Applying this new method to dynamic balancing of gas turbine, it is found that this method is superior to the LS method when influence coefficient matrix is ill-conditioned,the minimal correction masses and residual vibration are obtained in the dynamic balancing of rotors.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Bayesian Regression and Neuro-Fuzzy Methods Reliability Assessment for Estimating Streamflow
Directory of Open Access Journals (Sweden)
Yaseen A. Hamaamin
2016-07-01
Full Text Available Accurate and efficient estimation of streamflow in a watershed’s tributaries is prerequisite parameter for viable water resources management. This study couples process-driven and data-driven methods of streamflow forecasting as a more efficient and cost-effective approach to water resources planning and management. Two data-driven methods, Bayesian regression and adaptive neuro-fuzzy inference system (ANFIS, were tested separately as a faster alternative to a calibrated and validated Soil and Water Assessment Tool (SWAT model to predict streamflow in the Saginaw River Watershed of Michigan. For the data-driven modeling process, four structures were assumed and tested: general, temporal, spatial, and spatiotemporal. Results showed that both Bayesian regression and ANFIS can replicate global (watershed and local (subbasin results similar to a calibrated SWAT model. At the global level, Bayesian regression and ANFIS model performance were satisfactory based on Nash-Sutcliffe efficiencies of 0.99 and 0.97, respectively. At the subbasin level, Bayesian regression and ANFIS models were satisfactory for 155 and 151 subbasins out of 155 subbasins, respectively. Overall, the most accurate method was a spatiotemporal Bayesian regression model that outperformed other models at global and local scales. However, all ANFIS models performed satisfactory at both scales.
Uniform Consistency for Nonparametric Estimators in Null Recurrent Time Series
DEFF Research Database (Denmark)
Gao, Jiti; Kanaya, Shin; Li, Degui
2015-01-01
This paper establishes uniform consistency results for nonparametric kernel density and regression estimators when time series regressors concerned are nonstationary null recurrent Markov chains. Under suitable regularity conditions, we derive uniform convergence rates of the estimators. Our...... results can be viewed as a nonstationary extension of some well-known uniform consistency results for stationary time series....
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
An Efficient Proximal-Gradient Method for Single and Multi-task Regression with Structured Sparsity
Chen, Xi; Kim, Seyoung; Peña, Javier; Carbonell, Jaime G; Xing, Eric P
2010-01-01
We consider the optimization problem of learning regression models with a mixed-norm penalty that is defined over overlapping groups to achieve structured sparsity. It has been previously shown that such penalty can encode prior knowledge on the input or output structure to learn an structured-sparsity pattern in the regression parameters. However, because of the non-separability of the parameters of the overlapping groups, developing an efficient optimization method has remained a challenge. An existing method casts this problem as a second-order cone programming (SOCP) and solves it by interior-point methods. However, this approach is computationally expensive even for problems of moderate size. In this paper, we propose an efficient proximal-gradient method that achieves a faster convergence rate and has a significantly lower time complexity than solving the SOCP formulation. Our method exploits the structure of the non-smooth structured-sparsity-inducing norm, introduces its smooth approximation, and solv...
Estimation of safe doses: critical review of the hockey stick regression method
Energy Technology Data Exchange (ETDEWEB)
Yanagimoto, T.; Yamamoto, E.
1979-10-01
The hockey stick regression method is a convenient method to estimate safe doses, which is a kind of regression method using segmented lines. The method seems intuitively to be useful, but needs the assumption of the existence of the positive threshold value. The validity of the assumption is considered to be difficult to be shown. The alternative methods which are not based on the assumption, are given under suitable dose-response curves by introducing a risk level. Here the method using the probit model is compared with the hockey stick regression method. Computational results suggest that the alternative method is preferable. Furthermore similar problems in the case that response is measured as a continuous value are also extended. Data exemplified are concerned with relations of SO/sub 2/ to simple chronic bronchitis, relations of photochemical oxidants to eye discomfort and residual antibiotics in the lever of the chicks. These data was analyzed by the original authors under the assumption of the existence of the positive threshold values.
Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming
2011-06-01
Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Energy Technology Data Exchange (ETDEWEB)
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Directory of Open Access Journals (Sweden)
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Institute of Scientific and Technical Information of China (English)
林宇; 谭斌; 黄登仕; 魏宇
2011-01-01
This paper applies bandwidth nonparametric method and AR-GARCH to model the conditional mean and conditional volatility for estimating the standardized residuals of conditional returns, and then, L-Moment and MLE are used to estimate parameters of GPD, and estimate dynamic VaR and ES risk. Finally, this paper applies Back-Testing to test the accuracy of VaR and ES measurement model. Our results show that the nonparametric estimation seems superior to GARCH model in accuracy of risk measurement, and that the risk measurement model based on nonparametric estimation and L-moment method can effectively measure dynamic risks of shanghai and Shenzhen stock markets.%通过运用带宽非参数方法、AR-GARCH模型对时间序列的条件均值、条件波动性进行建模估计出标准残差序列,再运用L-Moment与MLE(maximum Likelihood estimation)估计标准残差的尾部的GPD参数,进而运用实验方法测度出风险VaR(value at Risk)及ES(Expected Shortfall),最后运用Back-Testing方法检验测度准确性.结果表明,基于带宽的非参数估计模型比GARCH簇模型在测度ES上具有更高的可靠性:基于非参数模型与L-Moment的风险测度模型能够有效测度沪深股市的动态VaR与ES.
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.
Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
Directory of Open Access Journals (Sweden)
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Directory of Open Access Journals (Sweden)
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times.
Xu, Yanxun; Müller, Peter; Wahed, Abdus S; Thall, Peter F
2016-01-01
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
Non-Parametric Estimation of Correlation Functions
DEFF Research Database (Denmark)
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Improved ENSO simulation in regional coupled GCM using regressive correction method
Institute of Scientific and Technical Information of China (English)
2007-01-01
A regressive correction method is presented with the primary goal of improving ENSO simulation in regional coupled GCM. It focuses on the correction of ocean-atmosphere exchanged fluxes. On the basis of numerical experiments and analysis, the method can be described as follows: first, driving the ocean model with heat and momentum flux computed from a long-term observation data set; the pro-duced SST is then applied to force the AGCM as its boundary condition; after that the AGCM’s simula-tion and the corresponding observation can be correlated by a linear regressive formula. Thus the re-gressive correction coefficients for the simulation with spatial and temporal variation could be obtained by linear fitting. Finally the coefficients are applied to redressing the variables used for the calculation of the exchanged air-sea flux in the coupled model when it starts integration. This method together with the anomaly coupling method is tested in a regional coupled model, which is composed of a global grid-point atmospheric general circulation model and a high-resolution tropical Pacific Ocean model. The comparison of the results shows that it is superior to the anomaly coupling both in reducing the coupled model ‘climate drift’ and in improving the ENSO simulation in the tropical Pacific Ocean.
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
A practical regression method of saturation exponential in pre-dose technique is proposed. The method is mainly applied for porcelain dating. To test, the method, some simulated paleodoses of the imitations of ancient porcelain were used. The measured results are in good agreement with the simulated values of the paleodoses, and the average ratios of the two values by using the two ways are 1.05 and 0.99 with standard deviations (±lσ) of 19% and 15% respectively. Such errors can be accepted in porcelain dating.
Zheng, Jun; Shao, Xinyu; Gao, Liang; Jiang, Ping; Qiu, Haobo
2015-06-01
Engineering design, especially for complex engineering systems, is usually a time-consuming process involving computation-intensive computer-based simulation and analysis methods. A difference mapping method using least square support vector regression is developed in this work, as a special metamodelling methodology that includes variable-fidelity data, to replace the computationally expensive computer codes. A general difference mapping framework is proposed where a surrogate base is first created, then the approximation is gained by a mapping the difference between the base and the real high-fidelity response surface. The least square support vector regression is adopted to accomplish the mapping. Two different sampling strategies, nested and non-nested design of experiments, are conducted to explore their respective effects on modelling accuracy. Different sample sizes and three approximation performance measures of accuracy are considered.
Directory of Open Access Journals (Sweden)
Kühnast, Corinna
2008-04-01
Full Text Available Background: Although non-normal data are widespread in biomedical research, parametric tests unnecessarily predominate in statistical analyses. Methods: We surveyed five biomedical journals and – for all studies which contain at least the unpaired t-test or the non-parametric Wilcoxon-Mann-Whitney test – investigated the relationship between the choice of a statistical test and other variables such as type of journal, sample size, randomization, sponsoring etc. Results: The non-parametric Wilcoxon-Mann-Whitney was used in 30% of the studies. In a multivariable logistic regression the type of journal, the test object, the scale of measurement and the statistical software were significant. The non-parametric test was more common in case of non-continuous data, in high-impact journals, in studies in humans, and when the statistical software is specified, in particular when SPSS was used.
ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
ZhuZhongyi; WeiBocheng
1999-01-01
In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.
Xu, Xiaohong; Chen, Yu; Jia, Haiwei
2009-07-01
The paper study the relation between Interest rate and Inflation rate, we use the Stepwise Regression Method to build the math model about the relation between Interest rate and Inflation rate. And the model has passed the significance test, and we use the model to discuss the influence on social economy through adjust Deposit rate, so we can provide a lot of theory proof for government to draw policy.
Unification of regression-based methods for the analysis of natural selection.
Morrissey, Michael B; Sakrejda, Krzysztof
2013-07-01
Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares-based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Directory of Open Access Journals (Sweden)
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Impact of regression methods on improved effects of soil structure on soil water retention estimates
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso
Chen, Xi; Lin, Qihang; Carbonell, Jaime G; Xing, Eric P
2010-01-01
We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as l1/l2-regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty function based on fusion penalty to encourage highly correlated outputs to share a common set of relevant inputs. In addition, we propose a simple yet efficient proximal-gradient method for optimizing GFlasso that can also be applied to any optimization problems with a convex smooth loss and the general class of fusion penalty defined on arbitrary graph stru...
Li, Min; Zhou, Tong; Song, Yanan
2016-07-01
A grain size characterization method based on energy attenuation coefficient spectrum and support vector regression (SVR) is proposed. First, the spectra of the first and second back-wall echoes are cut into several frequency bands to calculate the energy attenuation coefficient spectrum. Second, the frequency band that is sensitive to grain size variation is determined. Finally, a statistical model between the energy attenuation coefficient in the sensitive frequency band and average grain size is established through SVR. Experimental verification is conducted on austenitic stainless steel. The average relative error of the predicted grain size is 5.65%, which is better than that of conventional methods.
Local Linear Regression for Data with AR Errors
Institute of Scientific and Technical Information of China (English)
Runze Li; Yan Li
2009-01-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques.We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one.From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
Abraham, Simon; Raisee, Mehrdad; Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
Energy Technology Data Exchange (ETDEWEB)
Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium); Raisee, Mehrdad [School of Mechanical Engineering, College of Engineering, University of Tehran, P.O. Box: 11155-4563, Tehran (Iran, Islamic Republic of); Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium)
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
Tian, Guo-Liang; Tang, Man-Lai; Fang, Hong-Bin; Tan, Ming
2008-03-15
Fitting logistic regression models is challenging when their parameters are restricted. In this article, we first develop a quadratic lower-bound (QLB) algorithm for optimization with box or linear inequality constraints and derive the fastest QLB algorithm corresponding to the smallest global majorization matrix. The proposed QLB algorithm is particularly suited to problems to which EM-type algorithms are not applicable (e.g., logistic, multinomial logistic, and Cox's proportional hazards models) while it retains the same EM ascent property and thus assures the monotonic convergence. Secondly, we generalize the QLB algorithm to penalized problems in which the penalty functions may not be totally differentiable. The proposed method thus provides an alternative algorithm for estimation in lasso logistic regression, where the convergence of the existing lasso algorithm is not generally ensured. Finally, by relaxing the ascent requirement, convergence speed can be further accelerated. We introduce a pseudo-Newton method that retains the simplicity of the QLB algorithm and the fast convergence of the Newton method. Theoretical justification and numerical examples show that the pseudo-Newton method is up to 71 (in terms of CPU time) or 107 (in terms of number of iterations) times faster than the fastest QLB algorithm and thus makes bootstrap variance estimation feasible. Simulations and comparisons are performed and three real examples (Down syndrome data, kyphosis data, and colon microarray data) are analyzed to illustrate the proposed methods.
Marginal longitudinal semiparametric regression via penalized splines
Al Kadiri, M.
2010-08-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Marginal longitudinal semiparametric regression via penalized splines.
Kadiri, M Al; Carroll, R J; Wand, M P
2010-08-01
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Directory of Open Access Journals (Sweden)
Gholam Reza Sheykhzadeh
2017-02-01
Full Text Available Introduction: Penetration resistance is one of the criteria for evaluating soil compaction. It correlates with several soil properties such as vehicle trafficability, resistance to root penetration, seedling emergence, and soil compaction by farm machinery. Direct measurement of penetration resistance is time consuming and difficult because of high temporal and spatial variability. Therefore, many different regressions and artificial neural network pedotransfer functions have been proposed to estimate penetration resistance from readily available soil variables such as particle size distribution, bulk density (Db and gravimetric water content (θm. The lands of Ardabil Province are one of the main production regions of potato in Iran, thus, obtaining the soil penetration resistance in these regions help with the management of potato production. The objective of this research was to derive pedotransfer functions by using regression and artificial neural network to predict penetration resistance from some soil variations in the agricultural soils of Ardabil plain and to compare the performance of artificial neural network with regression models. Materials and methods: Disturbed and undisturbed soil samples (n= 105 were systematically taken from 0-10 cm soil depth with nearly 3000 m distance in the agricultural lands of the Ardabil plain ((lat 38°15' to 38°40' N, long 48°16' to 48°61' E. The contents of sand, silt and clay (hydrometer method, CaCO3 (titration method, bulk density (cylinder method, particle density (Dp (pychnometer method, organic carbon (wet oxidation method, total porosity(calculating from Db and Dp, saturated (θs and field soil water (θf using the gravimetric method were measured in the laboratory. Mean geometric diameter (dg and standard deviation (σg of soil particles were computed using the percentages of sand, silt and clay. Penetration resistance was measured in situ using cone penetrometer (analog model at 10
Fresno, Cristóbal; González, Germán Alexis; Merino, Gabriela Alejandra; Flesia, Ana Georgina; Podhajcer, Osvaldo Luis; Llera, Andrea Sabina; Fernández, Elmer Andrés
2017-03-01
The PAM50 classifier is used to assign patients to the highest correlated breast cancer subtype irrespectively of the obtained value. Nonetheless, all subtype correlations are required to build the risk of recurrence (ROR) score, currently used in therapeutic decisions. Present subtype uncertainty estimations are not accurate, seldom considered or require a population-based approach for this context. Here we present a novel single-subject non-parametric uncertainty estimation based on PAM50's gene label permutations. Simulations results ( n = 5228) showed that only 61% subjects can be reliably 'Assigned' to the PAM50 subtype, whereas 33% should be 'Not Assigned' (NA), leaving the rest to tight 'Ambiguous' correlations between subtypes. The NA subjects exclusion from the analysis improved survival subtype curves discrimination yielding a higher proportion of low and high ROR values. Conversely, all NA subjects showed similar survival behaviour regardless of the original PAM50 assignment. We propose to incorporate our PAM50 uncertainty estimation to support therapeutic decisions. Source code can be found in 'pbcmc' R package at Bioconductor. cristobalfresno@gmail.com or efernandez@bdmg.com.ar. Supplementary data are available at Bioinformatics online.
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Directory of Open Access Journals (Sweden)
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
Institute of Scientific and Technical Information of China (English)
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Directory of Open Access Journals (Sweden)
Bangyong Sun
2014-01-01
Full Text Available The polynomial regression method is employed to calculate the relationship of device color space and CIE color space for color characterization, and the performance of different expressions with specific parameters is evaluated. Firstly, the polynomial equation for color conversion is established and the computation of polynomial coefficients is analysed. And then different forms of polynomial equations are used to calculate the RGB and CMYK’s CIE color values, while the corresponding color errors are compared. At last, an optimal polynomial expression is obtained by analysing several related parameters during color conversion, including polynomial numbers, the degree of polynomial terms, the selection of CIE visual spaces, and the linearization.
Structural break detection method based on the Adaptive Regression Splines technique
Kucharczyk, Daniel; Wyłomańska, Agnieszka; Zimroz, Radosław
2017-04-01
For many real data, long term observation consists of different processes that coexist or occur one after the other. Those processes very often exhibit different statistical properties and thus before the further analysis the observed data should be segmented. This problem one can find in different applications and therefore new segmentation techniques have been appeared in the literature during last years. In this paper we propose a new method of time series segmentation, i.e. extraction from the analysed vector of observations homogeneous parts with similar behaviour. This method is based on the absolute deviation about the median of the signal and is an extension of the previously proposed techniques also based on the simple statistics. In this paper we introduce the method of structural break point detection which is based on the Adaptive Regression Splines technique, one of the form of regression analysis. Moreover we propose also the statistical test which allows testing hypothesis of behaviour related to different regimes. First, the methodology we apply to the simulated signals with different distributions in order to show the effectiveness of the new technique. Next, in the application part we analyse the real data set that represents the vibration signal from a heavy duty crusher used in a mineral processing plant.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Institute of Scientific and Technical Information of China (English)
张云贵; 赵华; 王丽娜
2012-01-01
To deal with the rising serious information security problem of the industrial control system (ICS) , this paper presents an intrusion detection method of the non-parametric cumulative sum (CUSUM) for industrial control network. Using the output-input dependent characteristics of the ICS, a mathematical model of the ICS is established to predict the output of the system. Once the sensors of the control system are under attack, the actual output will change. At every moment, the difference between the predicted output of the industrial control model and the measured signal by the sensors is calculated, and then the time-based statistical sequence is formed. By the non-parametric CUSUM algorithm, the online detection of the intrusion attacks is implemented and alarmed. The simulated detection experiments show that the proposed method has a good real-time and low false alarm rate. By choosing appropriate parameters r and β of the non-parametric CUSUM algorithm, the intrusion detection method can accurately detect the attacks before substantial damage to the control system and it is also helpful to monitor the misoperation.%为解决日趋严重的工业控制系统(industrial control system,ICS)信息安全问题,提出一种针对工业控制网络的非参数累积和( cumulative sum,CUSUM)入侵检测方法.利用ICS输入决定输出的特性,建立ICS的数学模型预测系统的输出,一旦控制系统的传感器遭受攻击,实际输出信号将发生改变.在每个时刻,计算工业控制模型的预测输出与传感器测量信号的差值,形成基于时间的统计序列,采用非参数CUSUM算法,实现在线检测入侵并报警.仿真检测实验证明,该方法具有良好的实时性和低误报率.选择适当的非参数CUSUM算法参数T和β,该入侵检测方法不但能在攻击对控制系统造成实质伤害前检测出攻击,还对监测ICS中的误操作有一定帮助.
Melo, Raquel; Vieira, Gonçalo; Caselli, Alberto; Ramos, Miguel
2010-05-01
Field surveying during the austral summer of 2007/08 and the analysis of a QuickBird satellite image, resulted on the production of a detailed geomorphological map of the Irizar and Crater Lake area in Deception Island (South Shetlands, Maritime Antarctic - 1:10 000) and allowed its analysis and spatial modelling of the geomorphological phenomena. The present study focus on the analysis of the spatial distribution and characteristics of hummocky terrains, lag surfaces and nivation hollows, complemented by GIS spatial modelling intending to identify relevant controlling geographical factors. Models of the susceptibility of occurrence of these phenomena were created using two statistical methods: logistical regression, as a multivariate method; and the informative value as a bivariate method. Success and prediction rate curves were used for model validation. The Area Under the Curve (AUC) was used to quantify the level of performance and prediction of the models and to allow the comparison between the two methods. Regarding the logistic regression method, the AUC showed a success rate of 71% for the lag surfaces, 81% for the hummocky terrains and 78% for the nivation hollows. The prediction rate was 72%, 68% and 71%, respectively. Concerning the informative value method, the success rate was 69% for the lag surfaces, 84% for the hummocky terrains and 78% for the nivation hollows, and with a correspondingly prediction of 71%, 66% and 69%. The results were of very good quality and demonstrate the potential of the models to predict the influence of independent variables in the occurrence of the geomorphological phenomena and also the reliability of the data. Key-words: present-day geomorphological dynamics, detailed geomorphological mapping, GIS, spatial modelling, Deception Island, Antarctic.
A fast nonlinear regression method for estimating permeability in CT perfusion imaging.
Bennink, Edwin; Riordan, Alan J; Horsch, Alexander D; Dankbaar, Jan Willem; Velthuis, Birgitta K; de Jong, Hugo W
2013-11-01
Blood-brain barrier damage, which can be quantified by measuring vascular permeability, is a potential predictor for hemorrhagic transformation in acute ischemic stroke. Permeability is commonly estimated by applying Patlak analysis to computed tomography (CT) perfusion data, but this method lacks precision. Applying more elaborate kinetic models by means of nonlinear regression (NLR) may improve precision, but is more time consuming and therefore less appropriate in an acute stroke setting. We propose a simplified NLR method that may be faster and still precise enough for clinical use. The aim of this study is to evaluate the reliability of in total 12 variations of Patlak analysis and NLR methods, including the simplified NLR method. Confidence intervals for the permeability estimates were evaluated using simulated CT attenuation-time curves with realistic noise, and clinical data from 20 patients. Although fixating the blood volume improved Patlak analysis, the NLR methods yielded significantly more reliable estimates, but took up to 12 × longer to calculate. The simplified NLR method was ∼4 × faster than other NLR methods, while maintaining the same confidence intervals (CIs). In conclusion, the simplified NLR method is a new, reliable way to estimate permeability in stroke, fast enough for clinical application in an acute stroke setting.
Yun, Yuqi; Zevin, Michael; Sampson, Laura; Kalogera, Vassiliki
2017-01-01
With more observations from LIGO in the upcoming years, we will be able to construct an observed mass distribution of black holes to compare with binary evolution simulations. This will allow us to investigate the physics of binary evolution such as the effects of common envelope efficiency and wind strength, or the properties of the population such as the initial mass function.However, binary evolution codes become computationally expensive when running large populations of binaries over a multi-dimensional grid of input parameters, and may simulate accurately only for a limited combination of input parameter values. Therefore we developed a fast machine-learning method that utilizes Gaussian Mixture Model (GMM) and Gaussian Process (GP) regression, which together can predict distributions over the entire parameter space based on a limited number of simulated models. Furthermore, Gaussian Process regression naturally provides interpolation errors in addition to interpolation means, which could provide a means of targeting the most uncertain regions of parameter space for running further simulations.We also present a case study on applying this new method to predicting chirp mass distributions for binary black hole systems (BBHs) in Milky-way like galaxies of different metallicities.
Kew, William; Mitchell, John B O
2015-09-01
The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. This investigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the 'wisdom of crowds' principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data preprocessing methodology was found to be crucial to performance of each method too. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Directory of Open Access Journals (Sweden)
Geert Verdoolaege
2015-07-01
Full Text Available In regression analysis for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS is the most popular. In many situations, the assumptions underlying OLS are not fulfilled, and several other approaches have been proposed. However, most techniques address only part of the shortcomings of OLS. We here discuss a new and more general regression method, which we call geodesic least squares regression (GLS. The method is based on minimization of the Rao geodesic distance on a probabilistic manifold. For the case of a power law, we demonstrate the robustness of the method on synthetic data in the presence of significant uncertainty on both the data and the regression model. We then show good performance of the method in an application to a scaling law in magnetic confinement fusion.
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
Directory of Open Access Journals (Sweden)
Lihua Yang
2015-04-01
Full Text Available In order to improve the accuracy of grain production forecasting, this study proposed a new combination forecasting model, the model combined stepwise regression method with RBF neural network by assigning proper weights using inverse variance method. By comparing different criteria, the result indicates that the combination forecasting model is superior to other models. The performance of the models is measured using three types of error measurement, which are Mean Absolute Percentage Error (MAPE, Theil Inequality Coefficient (Theil IC and Root Mean Squared Error (RMSE. The model with smallest value of MAPE, Theil IC and RMSE stands out to be the best model in predicting the grain production. Based on the MAPE, Theil IC and RMSE evaluation criteria, the combination model can reduce the forecasting error and has high prediction accuracy in grain production forecasting, making the decision more scientific and rational.
A non-linear regression method for CT brain perfusion analysis
Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.
2015-03-01
CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
A calibration method of Argo floats based on multiple regression analysis
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.
Feng, Zeny Z; Yang, Xiaojian; Subedi, Sanjeena; McNicholas, Paul D
2012-01-01
Recent work concerning quantitative traits of interest has focused on selecting a small subset of single nucleotide polymorphisms (SNPs) from amongst the SNPs responsible for the phenotypic variation of the trait. When considered as covariates, the large number of variables (SNPs) and their association with those in close proximity pose challenges for variable selection. The features of sparsity and shrinkage of regression coefficients of the least absolute shrinkage and selection operator (LASSO) method appear attractive for SNP selection. Sparse partial least squares (SPLS) is also appealing as it combines the features of sparsity in subset selection and dimension reduction to handle correlations amongst SNPs. In this paper we investigate application of the LASSO and SPLS methods for selecting SNPs that predict quantitative traits. We evaluate the performance of both methods with different criteria and under different scenarios using simulation studies. Results indicate that these methods can be effective in selecting SNPs that predict quantitative traits but are limited by some conditions. Both methods perform similarly overall but each exhibit advantages over the other in given situations. Both methods are applied to Canadian Holstein cattle data to compare their performance.
Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs
Directory of Open Access Journals (Sweden)
Faqir Muhammad
2007-01-01
Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2017-07-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Multi-step polynomial regression method to model and forecast malaria incidence.
Directory of Open Access Journals (Sweden)
Chandrajit Chatterjee
Full Text Available Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR of malaria; a smaller time series data (deaths due to Plasmodium vivax of one year; and spatial data (zonal distribution of P. vivax deaths for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city
Peripheral vascular trauma in children: related factors by the logistic regression method
Directory of Open Access Journals (Sweden)
Raquel Nogueira Avelar Silva
2014-03-01
Full Text Available The objective of the present study was to identify the factors related to “peripheral vascular trauma” in children aged six months to 12 years. This prospective cohort study included children with peripheral vein punctured for the first time per side and excluded those with high/complete healing of trauma signs after removing the catheter. Daily clinical evaluations were performed in intervals shorter than 24 hours. Data were treated according to Pearson’s test and the logistic regression method. Among the 14 variables considered intervenient, four were statistically associated to the occurrence of trauma: dirtiness and humidity in the catheter insertion site, catheter caliber, and age. A causal relationship was found between the intervenient variables and the outcome, “peripheral vascular trauma”, thus, contributing to forming the knowledge of the peripheral venous puncture in children aged six months to 12 years. Descriptors: Child; Nursing Diagnosis; Veins; Injuries.
SECANT-FUZZY LINEAR REGRESSION METHOD FOR HARMONIC COMPONENTS ESTIMATION IN A POWER SYSTEM
Institute of Scientific and Technical Information of China (English)
Garba Inoussa; LUO An
2003-01-01
In order to avoid unnecessary damage of electrical equipments and installations,high quality power should be delivered to the end user and strict control on frequency should be made, Therefore, it is important to estimate the power system's harmonic components with higher accuracy. This paper presents a new approach for estimating harmonic component in a power system using secant - fuzzy linear regression method. In this approach the non - sinusoidal voltage or current waveform is written as I linear function. The coefficient of this function is assumed to be fuzzy number with a membership function that has center and spread value. The time dependent quantity is written as Taylor series with two different time dependent quantities. The objective is to use the sample obtained from the transmission line to find the power system harmonic components and frequencies. We used an experimental voltage signal from a sub power station as a numerical test.
Cox regression with missing covariate data using a modified partial likelihood method
DEFF Research Database (Denmark)
Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.
2016-01-01
us to calculate estimators without having to assume anything about the distribution of the covariates. We show that the proposed estimator is consistent and asymptotically normal, and derive a consistent estimator of the variance-covariance matrix that does not involve any choice of a perturbation......Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...... function with the idea being to profile out this function before carrying out the estimation of the parameter of interest. In this step one uses a Breslow type estimator to estimate the cumulative baseline hazard function. We focus on the situation where the observed covariates are categorical which allows...
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Performance of robust regression methods in real-time polymerase chain reaction calibration.
Orenti, Annalisa; Marubini, Ettore
2014-12-09
The ordinary least squares (OLS) method is routinely used to estimate the unknown concentration of nucleic acids in a given solution by means of calibration. However, when outliers are present it could appear sensible to resort to robust regression methods. We analyzed data from an External Quality Control program concerning quantitative real-time PCR and we found that 24 laboratories out of 40 presented outliers, which occurred most frequently at the lowest concentrations. In this article we investigated and compared the performance of the OLS method, the least absolute deviation (LAD) method, and the biweight MM-estimator in real-time PCR calibration via a Monte Carlo simulation. Outliers were introduced by replacement contamination. When contamination was absent the coverages of OLS and MM-estimator intervals were acceptable and their widths small, whereas LAD intervals had acceptable coverages at the expense of higher widths. In the presence of contamination we observed a trade-off between width and coverage: the OLS performance got worse, the MM-estimator intervals widths remained short (but this was associated with a reduction in coverages), while LAD intervals widths were constantly larger with acceptable coverages at the nominal level.
Directory of Open Access Journals (Sweden)
Nina L. Timofeeva
2014-01-01
Full Text Available The article presents the methodological and technical bases for the creation of regression models that adequately reflect reality. The focus is on methods of removing residual autocorrelation in models. Algorithms eliminating heteroscedasticity and autocorrelation of the regression model residuals: reweighted least squares method, the method of Cochran-Orkutta are given. A model of "pure" regression is build, as well as to compare the effect on the dependent variable of the different explanatory variables when the latter are expressed in different units, a standardized form of the regression equation. The scheme of abatement techniques of heteroskedasticity and autocorrelation for the creation of regression models specific to the social and cultural sphere is developed.
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
Widyaningsih, Purnami; Retno Sari Saputro, Dewi; Nugrahani Putri, Aulia
2017-06-01
GWOLR model combines geographically weighted regression (GWR) and (ordinal logistic reression) OLR models. Its parameter estimation employs maximum likelihood estimation. Such parameter estimation, however, yields difficult-to-solve system of nonlinear equations, and therefore numerical approximation approach is required. The iterative approximation approach, in general, uses Newton-Raphson (NR) method. The NR method has a disadvantage—its Hessian matrix is always the second derivatives of each iteration so it does not always produce converging results. With regard to this matter, NR model is modified by substituting its Hessian matrix into Fisher information matrix, which is termed Fisher scoring (FS). The present research seeks to determine GWOLR model parameter estimation using Fisher scoring method and apply the estimation on data of the level of vulnerability to Dengue Hemorrhagic Fever (DHF) in Semarang. The research concludes that health facilities give the greatest contribution to the probability of the number of DHF sufferers in both villages. Based on the number of the sufferers, IR category of DHF in both villages can be determined.
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs.
A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis
Institute of Scientific and Technical Information of China (English)
TU Jun; LI Yan-ming; LIU Cheng-liang
2009-01-01
Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
Delwiche, Stephen R; Reeves, James B
2010-01-01
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (method has application to the evaluation of other preprocess functions and various types of spectroscopy data.
Isa, Zakiah Mohd; Tawfiq, Omar Farouq; Noor, Norliza Mohd; Shamsudheen, Mohd Iqbal; Rijal, Omar Mohd
2010-03-01
In rehabilitating edentulous patients, selecting appropriately sized teeth in the absence of preextraction records is problematic. The purpose of this study was to investigate the relationships between some facial dimensions and widths of the maxillary anterior teeth to potentially provide a guide for tooth selection. Sixty full dentate Malaysian adults (18-36 years) representing 2 ethnic groups (Malay and Chinese), with well aligned maxillary anterior teeth and minimal attrition, participated in this study. Standardized digital images of the face, viewed frontally, were recorded. Using image analyzing software, the images were used to determine the interpupillary distance (IPD), inner canthal distance (ICD), and interalar width (IA). Widths of the 6 maxillary anterior teeth were measured directly from casts of the subjects using digital calipers. Regression analyses were conducted to measure the strength of the associations between the variables (alpha=.10). The means (standard deviations) of IPD, IA, and ICD of the subjects were 62.28 (2.47), 39.36 (3.12), and 34.36 (2.15) mm, respectively. The mesiodistal diameters of the maxillary central incisors, lateral incisors, and canines were 8.54 (0.50), 7.09 (0.48), and 7.94 (0.40) mm, respectively. The width of the central incisors was highly correlated to the IPD (r=0.99), while the widths of the lateral incisors and canines were highly correlated to a combination of IPD and IA (r=0.99 and 0.94, respectively). Using regression methods, the widths of the anterior teeth within the population tested may be predicted by a combination of the facial dimensions studied. (c) 2010 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
Nonparametric estimation of employee stock options
Institute of Scientific and Technical Information of China (English)
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
A Novel Method for Flatness Pattern Recognition via Least Squares Support Vector Regression
Institute of Scientific and Technical Information of China (English)
2012-01-01
To adapt to the new requirement of the developing flatness control theory and technology, cubic patterns were introduced on the basis of the traditional linear, quadratic and quartic flatness basic patterns. Linear, quadratic, cubic and quartic Legendre orthogonal polynomials were adopted to express the flatness basic patterns. In order to over- come the defects live in the existent recognition methods based on fuzzy, neural network and support vector regres- sion （SVR） theory, a novel flatness pattern recognition method based on least squares support vector regression （LS-SVR） was proposed. On this basis, for the purpose of determining the hyper-parameters of LS-SVR effectively and enhan- cing the recognition accuracy and generalization performance of the model, particle swarm optimization algorithm with leave-one-out （LOO） error as fitness function was adopted. To overcome the disadvantage of high computational complexity of naive cross-validation algorithm, a novel fast cross-validation algorithm was introduced to calculate the LOO error of LDSVR. Results of experiments on flatness data calculated by theory and a 900HC cold-rolling mill practically measured flatness signals demonstrate that the proposed approach can distinguish the types and define the magnitudes of the flatness defects effectively with high accuracy, high speed and strong generalization ability.
Abdi, Hervé; Williams, Lynne J
2013-01-01
Partial least square (PLS) methods (also sometimes called projection to latent structures) relate the information present in two data tables that collect measurements on the same set of observations. PLS methods proceed by deriving latent variables which are (optimal) linear combinations of the variables of a data table. When the goal is to find the shared information between two tables, the approach is equivalent to a correlation problem and the technique is then called partial least square correlation (PLSC) (also sometimes called PLS-SVD). In this case there are two sets of latent variables (one set per table), and these latent variables are required to have maximal covariance. When the goal is to predict one data table the other one, the technique is then called partial least square regression. In this case there is one set of latent variables (derived from the predictor table) and these latent variables are required to give the best possible prediction. In this paper we present and illustrate PLSC and PLSR and show how these descriptive multivariate analysis techniques can be extended to deal with inferential questions by using cross-validation techniques such as the bootstrap and permutation tests.
Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
A non-parametric model for the cosmic velocity field
Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S
1999-01-01
We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
Combined parametric-nonparametric identification of block-oriented systems
Mzyk, Grzegorz
2014-01-01
This book considers a problem of block-oriented nonlinear dynamic system identification in the presence of random disturbances. This class of systems includes various interconnections of linear dynamic blocks and static nonlinear elements, e.g., Hammerstein system, Wiener system, Wiener-Hammerstein ("sandwich") system and additive NARMAX systems with feedback. Interconnecting signals are not accessible for measurement. The combined parametric-nonparametric algorithms, proposed in the book, can be selected dependently on the prior knowledge of the system and signals. Most of them are based on the decomposition of the complex system identification task into simpler local sub-problems by using non-parametric (kernel or orthogonal) regression estimation. In the parametric stage, the generalized least squares or the instrumental variables technique is commonly applied to cope with correlated excitations. Limit properties of the algorithms have been shown analytically and illustrated in simple experiments.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Comparisons of Short Term Load Forecasting using Artificial Neural Network and Regression Method
Directory of Open Access Journals (Sweden)
Rajesh Deshmukh
2011-12-01
Full Text Available In power systems the next day’s power generation must be scheduled every day, day ahead short-term load forecasting (STLF is a necessary daily task for power dispatch. Its accuracy affects the economic operation and reliability of the system greatly. Under prediction of STLF leads to insufficient reserve capacity preparation and in turn, increases the operating cost by using expensive peaking units. On the other hand, over prediction of STLF leads to the unnecessarily large reserve capacity, which is also related to high operating cost. the research work in this area is still a challenge to the electrical engineering scholars because of its high complexity. How to estimate the future load with the historical data has remained a difficulty up to now, especially for the load forecasting of holidays, days with extreme weather and other anomalous days. With the recent development of new mathematical, data mining and artificial intelligence tools, it is potentially possible to improve the forecasting result. This paper presents a new neural network based approach for short-term load forecasting that uses the most correlated weather data for training, validating and testing the neural network. Correlation analysis of weather data determines the input parameters of the neural networks. And its results compare to regression method.
Reddy, K. S.; Somasundharam, S.
2016-09-01
In this work, inverse heat conduction problem (IHCP) involving the simultaneous estimation of principal thermal conductivities (kxx,kyy,kzz ) and specific heat capacity of orthotropic materials is solved by using surrogate forward model. Uniformly distributed random samples for each unknown parameter is generated from the prior knowledge about these parameters and Finite Volume Method (FVM) is employed to solve the forward problem for temperature distribution with space and time. A supervised machine learning technique- Gaussian Process Regression (GPR) is used to construct the surrogate forward model with the available temperature solution and randomly generated unknown parameter data. The statistical and machine learning toolbox available in MATLAB R2015b is used for this purpose. The robustness of the surrogate model constructed using GPR is examined by carrying out the parameter estimation for 100 new randomly generated test samples at a measurement error of ±0.3K. The temperature measurement is obtained by adding random noise with the mean at zero and known standard deviation (σ = 0.1) to the FVM solution of the forward problem. The test results show that Mean Percentage Deviation (MPD) of all test samples for all parameters is < 10%.
Multiple predictor smoothing methods for sensitivity analysis.
Energy Technology Data Exchange (ETDEWEB)
Helton, Jon Craig; Storlie, Curtis B.
2006-08-01
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.
Nonparametric Bayesian Modeling of Complex Networks
DEFF Research Database (Denmark)
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
An asymptotically optimal nonparametric adaptive controller
Institute of Scientific and Technical Information of China (English)
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
The Monitored Atherosclerosis Regression Study (MARS). Design, methods and baseline results.
Cashin-Hemphill, L; Kramsch, D M; Azen, S P; DeMets, D; DeBoer, L W; Hwang, I; Vailas, L; Hirsch, L J; Mack, W J; DeBoer, L
1992-10-23
The Monitored Atherosclerosis Regression Study (MARS) was designed to evaluate the effect of cholesterol lowering by monotherapy with an HMG-CoA reductase inhibitor on progression/regression of atherosclerosis in subjects with angiographically documented coronary artery disease. The purpose of this paper is to present the design, methods, and baseline results of MARS. MARS is a prospective, randomized, double-blind, placebo-controlled trial with baseline, 2-year, and 4-year coronary angiography as well as carotid, brachial, and popliteal ultrasonography. Outpatient clinics at the University of Southern California School of Medicine and the University of Wisconsin School of Medicine. Two hundred seventy participants of both sexes were recruited directly from the cardiac catheterization laboratory or by chart review of patients having undergone cardiac catheterization in the past. Subjects were considered eligible if they had angiographically demonstrable atherosclerosis in 2 or more coronary artery segments, unaltered by angioplasty, with at least 1 lesion > or = 50% but or = 500 mg/dL; premenopausal females; uncontrolled hypertension; diabetes mellitus; untreated thyroid disease; liver dysfunction; renal insufficiency; congestive heart failure; major arrhythmia; left ventricular conduction defects; or any life-threatening disease. Subjects were placed on a low-fat, low-cholesterol diet and either 40 mg b.i.d. lovastatin (Mevacor) or placebo. Randomization was stratified by sex, smoking status, and TC. Per-subject average change in %S as determined by quantitative coronary angiography (QCA) is the primary angiographic endpoint. Secondary endpoints are: categorical analyses of the proportion of subjects with progression; human panel reading of coronary angiograms; and change in minimum lumen diameter (MLD) in mm by QCA. Carotid, brachial, and popliteal ultrasonography is also being performed. The subjects randomized into MARS are 91.5% male with an age range of 37 to
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.
REGRESSION DEPENDENCE CONSTRUCTION METHODOLOGY FOR TRACTION CURVES USING LEAST SQUARE METHOD
Directory of Open Access Journals (Sweden)
V. Ravino
2013-01-01
Full Text Available The paper presents a methodology that permits to construct regression dependences for traction curves of various tractors while using different operational backgrounds. The dependence construction process is carried out with the help of Microsoft Excel.
Semiparametric regression during 2003–2007
Ruppert, David
2009-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
A Frisch-Newton Algorithm for Sparse Quantile Regression
Institute of Scientific and Technical Information of China (English)
Roger Koenker; Pin Ng
2005-01-01
Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems.In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker[28].The new algorithm substantially reduces the storage (memory) requirements and increases computational speed.The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models.
Monotone Regression and Correction for Order Relation Deviations in Indicator Kriging
Institute of Scientific and Technical Information of China (English)
Han Yan; Yang Yiheng
2008-01-01
The indicator kriging (IK) is one of the most efficient nonparametric methods in geo-statistics. The order relation problem in the conditional cumulative distribution values obtained by IK is the most severe drawback of it. The correction of order relation deviations is an essential and important part of IK approach. A monotone regression was proposed as a new correction method which could minimize the deviation from original quintiles value, although, ensuring all order relations.
A primer on regression methods for decoding cis-regulatory logic
Energy Technology Data Exchange (ETDEWEB)
Das, Debopriya; Pellegrini, Matteo; Gray, Joe W.
2009-03-03
The rapidly emerging field of systems biology is helping us to understand the molecular determinants of phenotype on a genomic scale [1]. Cis-regulatory elements are major sequence-based determinants of biological processes in cells and tissues [2]. For instance, during transcriptional regulation, transcription factors (TFs) bind to very specific regions on the promoter DNA [2,3] and recruit the basal transcriptional machinery, which ultimately initiates mRNA transcription (Figure 1A). Learning cis-Regulatory Elements from Omics Data A vast amount of work over the past decade has shown that omics data can be used to learn cis-regulatory logic on a genome-wide scale [4-6]--in particular, by integrating sequence data with mRNA expression profiles. The most popular approach has been to identify over-represented motifs in promoters of genes that are coexpressed [4,7,8]. Though widely used, such an approach can be limiting for a variety of reasons. First, the combinatorial nature of gene regulation is difficult to explicitly model in this framework. Moreover, in many applications of this approach, expression data from multiple conditions are necessary to obtain reliable predictions. This can potentially limit the use of this method to only large data sets [9]. Although these methods can be adapted to analyze mRNA expression data from a pair of biological conditions, such comparisons are often confounded by the fact that primary and secondary response genes are clustered together--whereas only the primary response genes are expected to contain the functional motifs [10]. A set of approaches based on regression has been developed to overcome the above limitations [11-32]. These approaches have their foundations in certain biophysical aspects of gene regulation [26,33-35]. That is, the models are motivated by the expected transcriptional response of genes due to the binding of TFs to their promoters. While such methods have gathered popularity in the computational domain
Stefanello, C; Vieira, S L; Xue, P; Ajuwon, K M; Adeola, O
2016-07-01
A study was conducted to determine the ileal digestible energy (IDE), ME, and MEn contents of bakery meal using the regression method and to evaluate whether the energy values are age-dependent in broiler chickens from zero to 21 d post hatching. Seven hundred and eighty male Ross 708 chicks were fed 3 experimental diets in which bakery meal was incorporated into a corn-soybean meal-based reference diet at zero, 100, or 200 g/kg by replacing the energy-yielding ingredients. A 3 × 3 factorial arrangement of 3 ages (1, 2, or 3 wk) and 3 dietary bakery meal levels were used. Birds were fed the same experimental diets in these 3 evaluated ages. Birds were grouped by weight into 10 replicates per treatment in a randomized complete block design. Apparent ileal digestibility and total tract retention of DM, N, and energy were calculated. Expression of mucin (MUC2), sodium-dependent phosphate transporter (NaPi-IIb), solute carrier family 7 (cationic amino acid transporter, Y(+) system, SLC7A2), glucose (GLUT2), and sodium-glucose linked transporter (SGLT1) genes were measured at each age in the jejunum by real-time PCR. Addition of bakery meal to the reference diet resulted in a linear decrease in retention of DM, N, and energy, and a quadratic reduction (P energy as birds' ages increased from 1 to 3 wk. Dietary bakery meal did not affect jejunal gene expression. Expression of genes encoding MUC2, NaPi-IIb, and SLC7A2 linearly increased (P energy and nitrogen in the basal diet decreased when bakery meal was included and increased with age of broiler chickens.
ACCURACY OF MILK YIELD ESTIMATION IN DAIRY CATTLE FROM MONTHLY RECORD BY REGRESSION METHOD
Directory of Open Access Journals (Sweden)
I.S. Kuswahyuni
2014-10-01
Full Text Available This experiment was conducted to estimate the actual milk yield and to compare the estimation accuracyof cumulative monthly record to actual milk yield by regression method. Materials used in this experimentwere records relating to milk yield and pedigree. The obtained data were categorized into 2 groups i.e. AgeGroup I (AG I that was cow calving at < 36 months old as many as 33 cows with 33 lactation records andAG II that cows calving e” 36 months old as many as 44 cows with 105 lactation records. The first three toseven months data were used to estimate actual milk yield. Results showed that mean of milk yield/ head/lactation at AG I (2479.5 ± 461.5 kg was lower than that of AG II (2989,7 ± 526,8 kg. Estimated milk yieldsfor three to seven months at AG I were 2455.6±419.7; 2455.7±432.9; 2455.5±446.4; 2455.6±450.8; 2455,64± 450,8; 2455,5 ± 459,3 kg respectively, meanwhile at AG II was 2972.3±479.8; 2972.0±497.2; 2972.4±509.6;2972.5±523.6 and 2972.5±535.1 respectively. Correlation coefficients between estimated and actual milkyield at AG I were 0.79; 0.82; 0.86; 0.86 and 0.88, respectively, meanwhile at AG II were 0.65; 0.66; 0.67;0.69 and 0.72 respectively. In conclusion, the mean of estimated milk yield at AG I was lower than AG II.The best record to estimate actual milk yield both at AG I and AG II were the seven cumulative months.
Directory of Open Access Journals (Sweden)
Alcides Cabrera Campos
2012-09-01
Full Text Available Analyzing data from agricultural pest populations regularly detects that they do not fulfill the theoretical requirements to implement classical ANOVA. Box-Cox transformations and nonparametric statistical methods are commonly used as alternatives to solve this problem. In this paper, we describe the results of applying these techniques to data from Thrips palmi Karny sampled in potato (Solanum tuberosum L. plantations. The X² test was used for the goodness-of-fit of negative binomial distribution and as a test of independence to investigate the relationship between plant strata and insect stages. Seven data transformations were also applied to meet the requirements of classical ANOVA, which failed to eliminate the relationship between mean and variance. Given this negative result, comparisons between insect population densities were made using the nonparametric Kruskal-Wallis ANOVA test. Results from this analysis allowed selecting the insect larval stage and plant middle stratum as keys to design pest sampling plans.Al analizar datos provenientes de poblaciones de plagas agrícolas, regularmente se detecta que no cumplen los requerimientos teóricos para la aplicación del ANDEVA clásico. El uso de transformaciones Box-Cox y de métodos estadísticos no paramétricos resulta la alternativa más utilizada para resolver este inconveniente. En el presente trabajo se exponen los resultados de la aplicación de estas técnicas a datos provenientes de Thrips palmi Karny muestreadas en plantaciones de papa (Solanum tuberosum L. en el período de incidencia de la plaga. Se utilizó la dócima X² para la bondad de ajuste a la distribución binomial negativa y de independencia para investigar la relación entre los estratos de las plantas y los estados del insecto, se aplicaron siete transformaciones a los datos para satisfacer el cumplimiento de los supuestos básicos del ANDEVA, con las cuales no se logró eliminar la relación entre la media y la
Directory of Open Access Journals (Sweden)
Akhtar R. Siddique
2000-03-01
Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.
Crainiceanu, Ciprian M; Caffo, Brian S; Di, Chong-Zhi; Punjabi, Naresh M
2009-06-01
We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS.
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
Yamakoshi, Yasuhiro; Ogawa, Mitsuhiro; Yamakoshi, Takehiro; Tamura, Toshiyo; Yamakoshi, Ken-ichi
2009-01-01
A novel optical non-invasive in vivo blood glucose concentration (BGL) measurement technique, named "Pulse Glucometry", was combined with a kernel method; support vector machines. The total transmitted radiation intensity (I(lambda)) and the cardiac-related pulsatile changes superimposed on I(lambda) in human adult fingertips were measured over the wavelength range from 900 to 1700 nm using a very fast spectrophotometer, obtaining a differential optical density (DeltaOD(lambda)) related to the blood component in the finger tissues. Subsequently, a calibration model using paired data of a family of DeltaOD(lambda)s and the corresponding known BGLs was constructed with support vector machines (SVMs) regression instead of using calibration by a conventional primary component regression (PCR) and partial least squares regression (PLS). Secondly, SVM method was applied to make a nonlinear discriminant calibration model for "Pulse glucometry." Our results show that the regression calibration model based on the support vector machines can provide a good regression for the 101 paired data, in which the BGLs ranged from 89.0-219 mg/dl (4.94-12.2 mmol/l). The resultant regression was evaluated by the Clarke error grid analysis and all data points fell within the clinically acceptable regions (region A: 93%, region B: 7%). The discriminant calibration model using SVMs also provided a good result for classification (accuracy rate 84% in the best case).
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-01-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-03-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
Regression methods for spatially correlated data: an example using beetle attacks in a seed orchard
Preisler Haiganoush; Nancy G. Rappaport; David L. Wood
1997-01-01
We present a statistical procedure for studying the simultaneous effects of observed covariates and unmeasured spatial variables on responses of interest. The procedure uses regression type analyses that can be used with existing statistical software packages. An example using the rate of twig beetle attacks on Douglas-fir trees in a seed orchard illustrates the...
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Non-parametric transformation for data correlation and integration: From theory to practice
Energy Technology Data Exchange (ETDEWEB)
Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)
1997-08-01
The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.
Panel data nonparametric estimation of production risk and risk preferences
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...... approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price...
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
Directory of Open Access Journals (Sweden)
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
Nonparametric estimation for hazard rate monotonously decreasing system
Institute of Scientific and Technical Information of China (English)
Han Fengyan; Li Weisong
2005-01-01
Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System
Directory of Open Access Journals (Sweden)
Walaa Khalaf
2009-03-01
Full Text Available We describe an Electronic Nose (ENose system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values, the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte.
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
Partial least squares regression (PLSR) has been applied to various fields such as psychometrics, consumer science, econometrics and process control. Recently it has been applied to metabolomics based data sets (GC/LC-MS, NMR) and proven to be a very powerful in situations with many variables...... for the purpose of reducing over-fitting problems and providing useful interpretation tools. It has excellent possibilities for giving a graphical overview of sample and variation patterns. It can handle co-linearity in an efficient way and make it possible to use different highly correlated data sets in one...... Integrating Omics data. Statistical Applications in Genetics and Molecular Biology, 7:Article 35, 2008. 2. Martens H and Martens M. Modifed Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Quality and Preference, 11:5-16, 2000....
Least square regression method for estimating gas concentration in an electronic nose system.
Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio
2009-01-01
We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte.
Bayesian nonparametric duration model with censorship
Directory of Open Access Journals (Sweden)
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
Directory of Open Access Journals (Sweden)
Santana Isabel
2011-08-01
Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
OGAARD, B; TENBOSCH, JJ
This article describes a new nondestructive optical method for evaluation of lesion regression in vivo. White spot caries lesions were induced with orthodontic bands in two vital premolars of seven patients. The teeth were banded for 4 weeks with special orthodontic bands that allowed plaque
OGAARD, B; TENBOSCH, JJ
1994-01-01
This article describes a new nondestructive optical method for evaluation of lesion regression in vivo. White spot caries lesions were induced with orthodontic bands in two vital premolars of seven patients. The teeth were banded for 4 weeks with special orthodontic bands that allowed plaque accumul
Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A
2015-05-01
Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories.
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
High-dimensional regression with unknown variance
Giraud, Christophe; Verzelen, Nicolas
2011-01-01
We review recent results for high-dimensional sparse linear regression in the practical case of unknown variance. Different sparsity settings are covered, including coordinate-sparsity, group-sparsity and variation-sparsity. The emphasize is put on non-asymptotic analyses and feasible procedures. In addition, a small numerical study compares the practical performance of three schemes for tuning the Lasso esti- mator and some references are collected for some more general models, including multivariate regression and nonparametric regression.
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
Patching rainfall data using regression methods. 3. Grouping, patching and outlier detection
Pegram, Geoffrey
1997-11-01
Rainfall data are used, amongst other things, for augmenting or repairing streamflow records in a water resources analysis environment. Gaps in rainfall records cause problems in the construction of water-balance models using monthly time-steps, when it becomes necessary to estimate missing values. Modest extensions are sometimes also desirable. It is also important to identify outliers as possible erroneous data and to group data which are hydrologically similar in order to accomplish good patching. Algorithms are described which accomplish these tasks using the covariance biplot, multiple linear regression, singular value decomposition and the pseudo-Expectation-Maximization algorithm.
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
Estimation of Stochastic Volatility Models by Nonparametric Filtering
DEFF Research Database (Denmark)
Kanaya, Shin; Kristensen, Dennis
2016-01-01
/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Determination of benzo(apyrene content in PM10 using regression methods
Directory of Open Access Journals (Sweden)
Jacek Gębicki
2015-12-01
Full Text Available The paper presents an attempt of application of multidimensional linear regression to estimation of an empirical model describing the factors influencing on B(aP content in suspended dust PM10 in Olsztyn and Elbląg city regions between 2010 and 2013. During this period annual average concentration of B(aP in PM10 exceeded the admissible level 1.5-3 times. Conducted investigations confirm that the reasons of B(aP concentration increase are low-efficiency individual home heat stations or low-temperature heat sources, which are responsible for so-called low emission during heating period. Dependences between the following quantities were analysed: concentration of PM10 dust in air, air temperature, wind velocity, air humidity. A measure of model fitting to actual B(aP concentration in PM10 was the coefficient of determination of the model. Application of multidimensional linear regression yielded the equations characterized by high values of the coefficient of determination of the model, especially during heating season. This parameter ranged from 0.54 to 0.80 during the analyzed period.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Do Former College Athletes Earn More at Work? A Nonparametric Assessment
Henderson, Daniel J.; Olbrecht, Alexandre; Polachek, Solomon W.
2006-01-01
This paper investigates how students' collegiate athletic participation affects their subsequent labor market success. By using newly developed techniques in nonparametric regression, it shows that on average former college athletes earn a wage premium. However, the premium is not uniform, but skewed so that more than half the athletes actually…
Tanaka, Kenichi; Tateoka, Kunihiko; Asanuma, Osamu; Kamo, Ken-ichi; Sato, Kaori; Takeda, Hiromitsu; Takagi, Masaru; Hareyama, Masato; Takada, Jun
2014-01-01
The post-implantation dosimetry for brachytherapy using Monte Carlo calculation by EGS5 code combined with the source strength regression was investigated with respect to its validity. In this method, the source strength for the EGS5 calculation was adjusted with the regression, so that the calculation would reproduce the dose monitored with the glass rod dosimeters (GRDs) on a water phantom. The experiments were performed, simulating the case where one of two 125I sources of Oncoseed 6711 was lacking strength by 4–48%. As a result, the calculation without regression was in agreement with the GRD measurement within 26–62%. In this case, the shortage in strength of a source was neglected. By the regression, in order to reflect the strength shortage, the agreement was improved up to 17–24%. This agreement was also comparable with accuracy of the dose calculation for single source geometry reported previously. These results suggest the validity of the dosimetry method proposed in this study. PMID:24449715
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Institute of Scientific and Technical Information of China (English)
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Tracking Methods to Study the Surface Regression of the Solid-Propellant Grain
Directory of Open Access Journals (Sweden)
Yao Hsin Hwang
2014-10-01
Full Text Available In the work, we have successfully developed practical surface tracking methods to calculate the erosive volume and the associated burning areas which are the important parameters to solve a nonlinear, pressurization-rate dependent combustion ballistics. Three methodologies, namely the front tracking, the emanating ray and the least distance methods, are proposed. The front tracking method is based on the Lagrangian point of view; while both the emanating ray and the least distance methods are formulated from the Eulerian viewpoint. Two two-dimensional test problems have been examined to compare with the programming complexity, simulation accuracy and required CPU time of the proposed methods. It is found that the least distance method performs superior to the other two methods in numerical respects. The least distance method is implemented with tetrahedron grids to track the outward propagation of a three-dimensional cubic. Comparison between the predicted erosive volume and corresponding theoretical result yields satisfactory agreement.
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Stochastic search, optimization and regression with energy applications
Hannah, Lauren A.
Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation
Directory of Open Access Journals (Sweden)
Ahmad Bilfarsah
2005-04-01
Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.
Katpatal, Y. B.; Paranjpe, S. V.; Kadu, M.
2014-12-01
Effective Watershed management requires authentic data of surface runoff potential for which several methods and models are in use. Generally, non availability of field data calls for techniques based on remote observations. Soil Conservation Services Curve Number (SCS CN) method is an important method which utilizes information generated from remote sensing for estimation of runoff. Several attempts have been made to validate the runoff values generated from SCS CN method by comparing the results obtained from other methods. In the present study, runoff estimation through SCS CN method has been performed using IRS LISS IV data for the Venna Basin situated in the Central India. The field data was available for Venna Basin. The Land use/land cover and soil layers have been generated for the entire watershed using the satellite data and Geographic Information System (GIS). The Venna basin have been divided into intercepted catchment and free catchment. Run off values have been estimated using field data through regression analysis. The runoff values estimated using SCS CN method have been compared with yield values generated using data collected from the tank gauge stations and data from the discharge stations. The correlation helps in validation of the results obtained from the SCS CN method and its applicability in Indian conditions. Key Words: SCS CN Method, Regression Analysis, Land Use / Land cover, Runoff, Remote Sensing, GIS.
Directory of Open Access Journals (Sweden)
Sara Mortaz Hejri
2013-01-01
Full Text Available Background: One of the methods used for standard setting is the borderline regression method (BRM. This study aims to assess the reliability of BRM when the pass-fail standard in an objective structured clinical examination (OSCE was calculated by averaging the BRM standards obtained for each station separately. Materials and Methods: In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station′s standard. To determine the reliability, the root mean square error (RMSE was calculated. The R2 coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. Results: The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R2 coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. Conclusion: The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD
Directory of Open Access Journals (Sweden)
Eglantina HYSA
2012-06-01
Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.
Forecast daily indices of solar activity, F10.7, using support vector regression method
Institute of Scientific and Technical Information of China (English)
Cong Huang; Dan-Dan Liu; Jing-Song Wang
2009-01-01
The 10.7cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network.
Tan, F.; Lim, H. S.; Abdullah, K.; Yoon, T. L.; Zubir Matjafri, M.; Holben, B.
2014-02-01
Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global.
Cheng, Anyu; Jiang, Xiao; Li, Yongfu; Zhang, Chao; Zhu, Hao
2017-01-01
This study proposes a multiple sources and multiple measures based traffic flow prediction algorithm using the chaos theory and support vector regression method. In particular, first, the chaotic characteristics of traffic flow associated with the speed, occupancy, and flow are identified using the maximum Lyapunov exponent. Then, the phase space of multiple measures chaotic time series are reconstructed based on the phase space reconstruction theory and fused into a same multi-dimensional phase space using the Bayesian estimation theory. In addition, the support vector regression (SVR) model is designed to predict the traffic flow. Numerical experiments are performed using the data from multiple sources. The results show that, compared with the single measure, the proposed method has better performance for the short-term traffic flow prediction in terms of the accuracy and timeliness.
DEFF Research Database (Denmark)
Shirali, Mahmoud; Nielsen, Vivi Hunnicke; Møller, Steen Henrik
2014-01-01
The aim of this study was to determine genetic background of longitudinal residual feed intake (RFI) and body weight (BW) growth in farmed mink using random regression methods considering heterogeneous residual variances. Eight BW measures for each mink was recorded every three weeks from 63 to 210...... days of age for 2139 male mink and the same number of females. Cumulative feed intake was calculated six times with three weeks interval based on daily feed consumption between weighing’s from 105 to 210 days of age. Heritability estimates for RFI increased by age from 0.18 (0.03, standard deviation...... be obtained by only considering RFI estimate and BW at pelting, however, lower genetic correlations than unity indicate that extra genetic gain can be obtained by including estimates of these traits at the growing period. This study suggests random regression methods are suitable for analysing feed efficiency...
A least angle regression method for fMRI activation detection in phase-encoded experimental designs.
Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas M; Watson, David R; Benali, Habib
2010-10-01
This paper presents a new regression method for functional magnetic resonance imaging (fMRI) activation detection. Unlike general linear models (GLM), this method is based on selecting models for activation detection adaptively which overcomes the limitation of requiring a predefined design matrix in GLM. This limitation is because GLM designs assume that the response of the neuron populations will be the same for the same stimuli, which is often not the case. In this work, the fMRI hemodynamic response model is selected from a series of models constructed online by the least angle regression (LARS) method. The slow drift terms in the design matrix for the activation detection are determined adaptively according to the fMRI response in order to achieve the best fit for each fMRI response. The LARS method is then applied along with the Moore-Penrose pseudoinverse (PINV) and fast orthogonal search (FOS) algorithm for implementation of the selected model to include the drift effects in the design matrix. Comparisons with GLM were made using 11 normal subjects to test method superiority. This paper found that GLM with fixed design matrix was inferior compared to the described LARS method for fMRI activation detection in a phased-encoded experimental design. In addition, the proposed method has the advantage of increasing the degrees of freedom in the regression analysis. We conclude that the method described provides a new and novel approach to the detection of fMRI activation which is better than GLM based analyses.
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
Mortality study of nickel-cadmium battery workers by the method of regression models in life tables.
1983-01-01
The mortality experienced by a cohort of 3025 nickel-cadmium battery workers during the period 1946-81 has been investigated. Occupational histories were described in terms of some 75 jobs: eight with "high", 14 with "moderate" or slight, and 53 with minimal exposure to cadmium oxide (hydroxide). The method of regression models in life tables (RMLT) was used to compare the estimated cadmium exposures (durations of exposed employment) of those dying from causes of interest with those of matchi...
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Energy Technology Data Exchange (ETDEWEB)
Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO{sub 2}, Fe{sub 2}O{sub 3}, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na{sub 2}O, K{sub 2}O, TiO{sub 2}, and P{sub 2}O{sub 5}, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high
Rezaei, B; Khayamian, T; Mokhtari, A
2009-02-20
A flow injection chemiluminescent (FI-CL) method has been developed for the simultaneous determination of codeine and noscapine using N-PLS regression. The method is based on the fact that kinetic characteristics of codeine and noscapine are different in the Ru(phen)(3)(2+)-Ce(IV) CL system. In flow injection mode, codeine gives broad peak with the highest CL intensity at 4.4s, whereas the maximum CL intensity of the noscapine appears at about 2.6s. Moreover, the effect of increasing H(2)SO(4) concentration was different on the CL intensity of the compounds. An experimental design, central composite design (CCD), was used to realize the optimized variables such as Ru(II) and Ce(IV) concentrations for the both compounds. At the optimized condition, a three-way data structure (samples, H(2)SO(4) concentration, time) was constructed and followed by N-PLS regression. The number of factors for the N-PLS regression was selected based on the minimum values for the root mean squared error of cross validation (RMSECV). The proposed method is applied to the simultaneous quantification of codeine and noscapine in the pharmaceutical preparations.
Inverse probability weighted Cox regression for doubly truncated data.
Mandel, Micha; de Uña-Álvarez, Jacobo; Simon, David K; Betensky, Rebecca A
2017-09-08
Doubly truncated data arise when event times are observed only if they fall within subject-specific, possibly random, intervals. While non-parametric methods for survivor function estimation using doubly truncated data have been intensively studied, only a few methods for fitting regression models have been suggested, and only for a limited number of covariates. In this article, we present a method to fit the Cox regression model to doubly truncated data with multiple discrete and continuous covariates, and describe how to implement it using existing software. The approach is used to study the association between candidate single nucleotide polymorphisms and age of onset of Parkinson's disease. © 2017, The International Biometric Society.
Functional Regression for Quasar Spectra
Ciollaro, Mattia; Freeman, Peter; Genovese, Christopher; Lei, Jing; O'Connell, Ross; Wasserman, Larry
2014-01-01
The Lyman-alpha forest is a portion of the observed light spectrum of distant galactic nuclei which allows us to probe remote regions of the Universe that are otherwise inaccessible. The observed Lyman-alpha forest of a quasar light spectrum can be modeled as a noisy realization of a smooth curve that is affected by a `damping effect' which occurs whenever the light emitted by the quasar travels through regions of the Universe with higher matter concentration. To decode the information conveyed by the Lyman-alpha forest about the matter distribution, we must be able to separate the smooth `continuum' from the noise and the contribution of the damping effect in the quasar light spectra. To predict the continuum in the Lyman-alpha forest, we use a nonparametric functional regression model in which both the response and the predictor variable (the smooth part of the damping-free portion of the spectrum) are function-valued random variables. We demonstrate that the proposed method accurately predicts the unobserv...
Directory of Open Access Journals (Sweden)
J. Alm
2007-11-01
Full Text Available Closed (non-steady state chambers are widely used for quantifying carbon dioxide (CO2 fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764 conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions
Huang, Lei
2015-09-30
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required.
Borodachev, S. M.
2016-06-01
The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
A Censored Nonparametric Software Reliability Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.
2017-08-01
Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Consistency analysis of subspace identification methods based on a linear regression approach
DEFF Research Database (Denmark)
Knudsen, Torben
2001-01-01
In the literature results can be found which claim consistency for the subspace method under certain quite weak assumptions. Unfortunately, a new result gives a counter example showing inconsistency under these assumptions and then gives new more strict sufficient assumptions which however does n...
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald
2013-01-01
The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...
Directory of Open Access Journals (Sweden)
Hongying Du
Full Text Available The epidermal growth factor receptor (EGFR protein tyrosine kinase (PTK is an important protein target for anti-tumor drug discovery. To identify potential EGFR inhibitors, we conducted a quantitative structure-activity relationship (QSAR study on the inhibitory activity of a series of quinazoline derivatives against EGFR tyrosine kinase. Two 2D-QSAR models were developed based on the best multi-linear regression (BMLR and grid-search assisted projection pursuit regression (GS-PPR methods. The results demonstrate that the inhibitory activity of quinazoline derivatives is strongly correlated with their polarizability, activation energy, mass distribution, connectivity, and branching information. Although the present investigation focused on EGFR, the approach provides a general avenue in the structure-based drug development of different protein receptor inhibitors.
Institute of Scientific and Technical Information of China (English)
WANG; Weida; XIA; Junding; ZHOU; Zhixin
2006-01-01
This paper studies the thermoluminescence (TL) dating of the ancient porcelain using a regression method of saturation exponential in the pre-dose technique. The experimental results show that the measured errors are 15% (±1σ) for the paleodose and 17% (±1σ) for the annual dose respectively, and the TL age error is 23% (±1σ) in this method. The larger Chinese porcelains from the museum and the nation-wide collectors have been dated by this method. The results show that the certainty about the authenticity testing is larger than 95%, and the measurable porcelains make up about 95% of the porcelain dated. It is very successful in discrimination for the imitations of ancient Chinese porcelains. This paper describes the measured principle and method for the paleodose of porcelains. The TL ages are dated by this method for the 39 shards and porcelains from past dynasties of China and the detailed data in the measurement are reported.
Directory of Open Access Journals (Sweden)
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than
Nonlinear Spline Kernel-based Partial Least Squares Regression Method and Its Application
Institute of Scientific and Technical Information of China (English)
JIA Jin-ming; WEN Xiang-jun
2008-01-01
Inspired by the traditional Wold's nonlinear PLS algorithm comprises of NIPALS approach and a spline inner function model,a novel nonlinear partial least squares algorithm based on spline kernel(named SK-PLS)is proposed for nonlinear modeling in the presence of multicollinearity.Based on the iuner-product kernel spanned by the spline basis functions with infinite numher of nodes,this method firstly maps the input data into a high dimensional feature space,and then calculates a linear PLS model with reformed NIPALS procedure in the feature space and gives a unified framework of traditional PLS"kernel"algorithms in consequence.The linear PLS in the feature space corresponds to a nonlinear PLS in the original input (primal)space.The good approximating property of spline kernel function enhances the generalization ability of the novel model,and two numerical experiments are given to illustrate the feasibility of the proposed method.
PREDICTING MOVIE SUCCESS FROM SEARCH QUERY USING SUPPORT VECTOR REGRESSION METHOD
Directory of Open Access Journals (Sweden)
Chanseung Lee
2016-01-01
Full Text Available Query data from search engines can provide many insights about the human behavior. Therefore, massive data resulting from human interactions may offer a new perspective on the behavior of the market. By analyzing Google query database for search terms, we present a method of analyzing large numbers of search queries to predict outcomes such as movie incomes. Our results illustrate the potential of combining extensive behavioral data sets that offer a better understanding of collective human behavior.
Deng, Zhaohong; Choi, Kup-Sze; Jiang, Yizhang; Wang, Shitong
2014-12-01
Inductive transfer learning has attracted increasing attention for the training of effective model in the target domain by leveraging the information in the source domain. However, most transfer learning methods are developed for a specific model, such as the commonly used support vector machine, which makes the methods applicable only to the adopted models. In this regard, the generalized hidden-mapping ridge regression (GHRR) method is introduced in order to train various types of classical intelligence models, including neural networks, fuzzy logical systems and kernel methods. Furthermore, the knowledge-leverage based transfer learning mechanism is integrated with GHRR to realize the inductive transfer learning method called transfer GHRR (TGHRR). Since the information from the induced knowledge is much clearer and more concise than that from the data in the source domain, it is more convenient to control and balance the similarity and difference of data distributions between the source and target domains. The proposed GHRR and TGHRR algorithms have been evaluated experimentally by performing regression and classification on synthetic and real world datasets. The results demonstrate that the performance of TGHRR is competitive with or even superior to existing state-of-the-art inductive transfer learning algorithms.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Decompounding random sums: A nonparametric approach
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Li, Zitong; Sillanpää, Mikko J
2012-08-01
Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
Fang, Sheng; Guo, Hua
2013-01-01
The parallel magnetic resonance imaging (parallel imaging) technique reduces the MR data acquisition time by using multiple receiver coils. Coil sensitivity estimation is critical for the performance of parallel imaging reconstruction. Currently, most coil sensitivity estimation methods are based on linear interpolation techniques. Such methods may result in Gibbs-ringing artifact or resolution loss, when the resolution of coil sensitivity data is limited. To solve the problem, we proposed a nonlinear coil sensitivity estimation method based on steering kernel regression, which performs a local gradient guided interpolation to the coil sensitivity. The in vivo experimental results demonstrate that this method can effectively suppress Gibbs ringing artifact in coil sensitivity and reduces both noise and residual aliasing artifact level in SENSE reconstruction.
Kügler, S D; Hoecker, M
2014-01-01
Context: In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. While classical approaches (e.g. template fitting) are fine for objects of well-known classes, alternative techniques have to be developed to determine those that do not fit. Therefore a classification scheme should be based on individual properties instead of fitting to a global model and therefore loose valuable information. An important issue when dealing with large data sets is the outlier detection which at the moment is often treated problem-orientated. Aims: In this paper we present a method to statistically estimate the redshift z based on a similarity approach. This allows us to determine redshifts in spectra in emission as well as in absorption without using any predefined model. Additionally we show how an estimate of the redshift based on single features is possible. As a consequence we are e.g. able to filter objects which show multiple redshift components. We propose to apply ...
Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing
DEFF Research Database (Denmark)
Hald, Ditte Høvenhoff
The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...
Ahangar, Reza Gharoie; Pournaghshband, Hassan
2010-01-01
In this paper, researchers estimated the stock price of activated companies in Tehran (Iran) stock exchange. It is used Linear Regression and Artificial Neural Network methods and compared these two methods. In Artificial Neural Network, of General Regression Neural Network method (GRNN) for architecture is used. In this paper, first, researchers considered 10 macro economic variables and 30 financial variables and then they obtained seven final variables including 3 macro economic variables and 4 financial variables to estimate the stock price using Independent components Analysis (ICA). So, we presented an equation for two methods and compared their results which shown that artificial neural network method is more efficient than linear regression method.
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface
Institute of Scientific and Technical Information of China (English)
康春花; 任平; 曾平飞
2015-01-01
Examinations help students learn more efficiently by filling their learning gaps. To achieve this goal, we have to differentiate students who have from those who have not mastered a set of attributes as measured by the test through cognitive diagnostic assessment. K-means cluster analysis, being a nonparametric cognitive diagnosis method requires the Q-matrix only, which reflects the relationship between attributes and items. This does not require the estimation of the parameters, so is independent of sample size, simple to operate, and easy to understand. Previous research use the sum score vectors or capability scores vector as the clustering objects. These methods are only adaptive for dichotomous data. Structural response items are, however, the main type used in examinations, particularly as required in recent reforms. On the basis of previous research, this paper puts forward a method to calculate a capability matrix reflecting the mastery level on skills and is applicable to grade response items. Our study included four parts. First, we introduced the K-means cluster diagnosis method which has been adapted for dichotomous data. Second, we expanded the K-means cluster diagnosis method for grade response data (GRCDM). Third, in Part Two, we investigated the performance of the method introduced using a simulation study. Fourth, we investigated the performance of the method in an empirical study. The simulation study focused on three factors. First, the sample size was set to be 100, 500, and 1000. Second, the percentage of random errors was manipulated to be 5%, 10%, and 20%. Third, it had four hierarchies, as proposed by Leighton. All experimental conditions composed of seven attributes, different items according to hierarchies. Simulation results showed that: (1) GRCDM had a high pattern match ratio (PMR) and high marginal match ratio (MMR). This method was shown to be feasible in cognitive diagnostic assessment. (2) The classification accuracy (MMR and PMR
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Liu, Jiaqi; Han, Jing; Zhang, Yi; Bai, Lianfa
2015-10-01
Locally adaptive regression kernels model can describe the edge shape of images accurately and graphic trend of images integrally, but it did not consider images' color information while the color is an important element of an image. Therefore, we present a novel method of target recognition based on 3-D-color-space locally adaptive regression kernels model. Different from the general additional color information, this method directly calculate the local similarity features of 3-D data from the color image. The proposed method uses a few examples of an object as a query to detect generic objects with incompact, complex and changeable shapes. Our method involves three phases: First, calculating the novel color-space descriptors from the RGB color space of query image which measure the likeness of a voxel to its surroundings. Salient features which include spatial- dimensional and color -dimensional information are extracted from said descriptors, and simplifying them to construct a non-similar local structure feature set of the object class by principal components analysis (PCA). Second, we compare the salient features with analogous features from the target image. This comparison is done using a matrix generalization of the cosine similarity measure. Then the similar structures in the target image are obtained using local similarity structure statistical matching. Finally, we use the method of non-maxima suppression in the similarity image to extract the object position and mark the object in the test image. Experimental results demonstrate that our approach is effective and accurate in improving the ability to identify targets.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Baraldi, Piero; Di Maio, Francesco; Turati, Pietro; Zio, Enrico
2015-08-01
In this work, we propose a modification of the traditional Auto Associative Kernel Regression (AAKR) method which enhances the signal reconstruction robustness, i.e., the capability of reconstructing abnormal signals to the values expected in normal conditions. The modification is based on the definition of a new procedure for the computation of the similarity between the present measurements and the historical patterns used to perform the signal reconstructions. The underlying conjecture for this is that malfunctions causing variations of a small number of signals are more frequent than those causing variations of a large number of signals. The proposed method has been applied to real normal condition data collected in an industrial plant for energy production. Its performance has been verified considering synthetic and real malfunctioning. The obtained results show an improvement in the early detection of abnormal conditions and the correct identification of the signals responsible of triggering the detection.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Institute of Scientific and Technical Information of China (English)
冯青平; 李星毅
2015-01-01
为了解决海量交通大数据实时预测问题,引入了Hadoop云平台结合K近邻非参数回归方法预测短时交通流.由于MapReduce框架的并行性,大大缩减了查找K个近邻的时间.通过实验证明,在集群上的预测时间相比在单机上的预测时间大大缩减.并且基于Ma-pReduce框架的预测速度随着集群规模的增大而增大,表现出集群的可扩展性.该方法可以满足交通控制和交通诱导系统的实时性和精确性的需求.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
Directory of Open Access Journals (Sweden)
Pao-Shin Chu
2007-01-01
Full Text Available In this study, a multivariate linear regression model is applied to predict the seasonal tropical cyclone (TC count in the vicinity of Taiwan using large-scale climate variables available from the preceding May. Here the season encompasses the five-month period from June through October, when typhoons are most active in the study domain. The model is based on the least absolute deviation so that regression estimates are more resistant (i.e., not unduly influenced by outliers than those derived from the ordinary least square method. Through lagged correlation analysis, five parameters (sea surface temperature, sea level pressure, precipitable water, low-level relative vorticity, and vertical wind shear in key locations of the tropical western North Pacific are identified as predictor datasets. Results from crossvalidation suggest that the statistical model is skillful in predicting TC activity, with a correlation coefficient of 0.63 for 1970 - 2003. If more recent data are included, the correlation coefficient reaches 0.69 for 1970 - 2006. Relative importance of each predictor variable is evaluated. For predicting higher than normal seasonal TC activity, warmer sea surface temperatures, a moist troposphere, and the presence of a low-level cyclonic circulation coupled with low-latitude westerlies in the Philippine Sea in the antecedent May appear to be important.
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Directory of Open Access Journals (Sweden)
Ying-Hsin Chang
2013-01-01
Full Text Available Human estrogen receptor (ER isoforms, ERα and ERβ, have long been an important focus in the field of biology. To better understand the structural features associated with the binding of ERα ligands to ERα and modulate their function, several QSAR models, including CoMFA, CoMSIA, SVR, and LR methods, have been employed to predict the inhibitory activity of 68 raloxifene derivatives. In the SVR and LR modeling, 11 descriptors were selected through feature ranking and sequential feature addition/deletion to generate equations to predict the inhibitory activity toward ERα. Among four descriptors that constantly appear in various generated equations, two agree with CoMFA and CoMSIA steric fields and another two can be correlated to a calculated electrostatic potential of ERα.
Yadav, Manish; Singh, Nitin Kumar
2017-08-01
A comparison of the linear and non-linear regression method in selecting the optimum isotherm among three most commonly used adsorption isotherms (Langmuir, Freundlich, and Redlich-Peterson) was made to the experimental data of fluoride (F) sorption onto Bio-F at a solution temperature of 30 ± 1 °C. The coefficient of correlation (r2 ) was used to select the best theoretical isotherm among the investigated ones. A total of four Langmuir linear equations were discussed and out of which linear form of most popular Langmuir-1 and Langmuir-2 showed the higher coefficient of determination (0.976 and 0.989) as compared to other Langmuir linear equations. Freundlich and Redlich-Peterson isotherms showed a better fit to the experimental data in linear least-square method, while in non-linear method Redlich-Peterson isotherm equations showed the best fit to the tested data set. The present study showed that the non-linear method could be a better way to obtain the isotherm parameters and represent the most suitable isotherm. Redlich-Peterson isotherm was found to be the best representative (r2 = 0.999) for this sorption system. It is also observed that the values of β are not close to unity, which means the isotherms are approaching the Freundlich but not the Langmuir isotherm.
Directory of Open Access Journals (Sweden)
Corrado Dimauro
2010-01-01
Full Text Available Two methods of SNPs pre-selection based on single marker regression for the estimation of genomic breeding values (G-EBVs were compared using simulated data provided by the XII QTL-MAS workshop: i Bonferroni correction of the significance threshold and ii Permutation test to obtain the reference distribution of the null hypothesis and identify significant markers at P<0.01 and P<0.001 significance thresholds. From the set of markers significant at P<0.001, random subsets of 50% and 25% markers were extracted, to evaluate the effect of further reducing the number of significant SNPs on G-EBV predictions. The Bonferroni correction method allowed the identification of 595 significant SNPs that gave the best G-EBV accuracies in prediction generations (82.80%. The permutation methods gave slightly lower G-EBV accuracies even if a larger number of SNPs resulted significant (2,053 and 1,352 for 0.01 and 0.001 significance thresholds, respectively. Interestingly, halving or dividing by four the number of SNPs significant at P<0.001 resulted in an only slightly decrease of G-EBV accuracies. The genetic structure of the simulated population with few QTL carrying large effects, might have favoured the Bonferroni method.
Risser, Dennis W.; Thompson, Ronald E.; Stuckey, Marla H.
2008-01-01
A method was developed for making estimates of long-term, mean annual ground-water recharge from streamflow data at 80 streamflow-gaging stations in Pennsylvania. The method relates mean annual base-flow yield derived from the streamflow data (as a proxy for recharge) to the climatic, geologic, hydrologic, and physiographic characteristics of the basins (basin characteristics) by use of a regression equation. Base-flow yield is the base flow of a stream divided by the drainage area of the basin, expressed in inches of water basinwide. Mean annual base-flow yield was computed for the period of available streamflow record at continuous streamflow-gaging stations by use of the computer program PART, which separates base flow from direct runoff on the streamflow hydrograph. Base flow provides a reasonable estimate of recharge for basins where streamflow is mostly unaffected by upstream regulation, diversion, or mining. Twenty-eight basin characteristics were included in the exploratory regression analysis as possible predictors of base-flow yield. Basin characteristics found to be statistically significant predictors of mean annual base-flow yield during 1971-2000 at the 95-percent confidence level were (1) mean annual precipitation, (2) average maximum daily temperature, (3) percentage of sand in the soil, (4) percentage of carbonate bedrock in the basin, and (5) stream channel slope. The equation for predicting recharge was developed using ordinary least-squares regression. The standard error of prediction for the equation on log-transformed data was 9.7 percent, and the coefficient of determination was 0.80. The equation can be used to predict long-term, mean annual recharge rates for ungaged basins, providing that the explanatory basin characteristics can be determined and that the underlying assumption is accepted that base-flow yield derived from PART is a reasonable estimate of ground-water recharge rates. For example, application of the equation for 370
Directory of Open Access Journals (Sweden)
Upender Manne
2007-01-01
Full Text Available Background: Although a majority of studies in cancer biomarker discovery claim to use proportional hazards regression (PHREG to the study the ability of a biomarker to predict survival, few studies use the predicted probabilities obtained from the model to test the quality of the model. In this paper, we compared the quality of predictions by a PHREG model to that of a linear discriminant analysis (LDA in both training and test set settings. Methods: The PHREG and LDA models were built on a 491 colorectal cancer (CRC patient dataset comprised of demographic and clinicopathologic variables, and phenotypic expression of p53 and Bcl-2. Two variable selection methods, stepwise discriminant analysis and the backward selection, were used to identify the final models. The endpoint of prediction in these models was five-year post-surgery survival. We also used linear regression model to examine the effect of bin size in the training set on the accuracy of prediction in the test set.Results: The two variable selection techniques resulted in different models when stage was included in the list of variables available for selection. However, the proportion of survivors and non-survivors correctly identified was identical in both of these models. When stage was excluded from the variable list, the error rate for the LDA model was 42% as compared to an error rate of 34% for the PHREG model.Conclusions: This study suggests that a PHREG model can perform as well or better than a traditional classifier such as LDA to classify patients into prognostic classes. Also, this study suggests that in the absence of the tumor stage as a variable, Bcl-2 expression is a strong prognostic molecular marker of CRC.
Variable selection in identification of a high dimensional nonlinear non-parametric system
Institute of Scientific and Technical Information of China (English)
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
Using a nonparametric PV model to forecast AC power output of PV plants
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
In this paper, a methodology using a nonparametric model is used to forecast AC power output of PV plants using as inputs several forecasts of meteorological variables from a Numerical Weather Prediction (NWP) model and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quantile Regression Forests as machine learning tool to forecast the AC power with a confidence interval. Real data from five PV plants was used to validate the methodology, an...
Cannon, Alex J.
2011-09-01
The qrnn package for R implements the quantile regression neural network, which is an artificial neural network extension of linear quantile regression. The model formulation follows from previous work on the estimation of censored regression quantiles. The result is a nonparametric, nonlinear model suitable for making probabilistic predictions of mixed discrete-continuous variables like precipitation amounts, wind speeds, or pollutant concentrations, as well as continuous variables. A differentiable approximation to the quantile regression error function is adopted so that gradient-based optimization algorithms can be used to estimate model parameters. Weight penalty and bootstrap aggregation methods are used to avoid overfitting. For convenience, functions for quantile-based probability density, cumulative distribution, and inverse cumulative distribution functions are also provided. Package functions are demonstrated on a simple precipitation downscaling task.
Directory of Open Access Journals (Sweden)
Lüdtke Rainer
2008-08-01
Full Text Available Abstract Background Regression to the mean (RTM occurs in situations of repeated measurements when extreme values are followed by measurements in the same subjects that are closer to the mean of the basic population. In uncontrolled studies such changes are likely to be interpreted as a real treatment effect. Methods Several statistical approaches have been developed to analyse such situations, including the algorithm of Mee and Chua which assumes a known population mean μ. We extend this approach to a situation where μ is unknown and suggest to vary it systematically over a range of reasonable values. Using differential calculus we provide formulas to estimate the range of μ where treatment effects are likely to occur when RTM is present. Results We successfully applied our method to three real world examples denoting situations when (a no treatment effect can be confirmed regardless which μ is true, (b when a treatment effect must be assumed independent from the true μ and (c in the appraisal of results of uncontrolled studies. Conclusion Our method can be used to separate the wheat from the chaff in situations, when one has to interpret the results of uncontrolled studies. In meta-analysis, health-technology reports or systematic reviews this approach may be helpful to clarify the evidence given from uncontrolled observational studies.
Institute of Scientific and Technical Information of China (English)
李锐华; 高乃奎; 谢恒堃; 史维祥
2004-01-01
Objective To investigate various data message of the stator bars condition parameters under the condition that only a few samples are available, especially about correlation information between the nondestructive parameters and residual breakdown voltage of the stator bars. Methods Artificial stator bars is designed to simulate the generator bars. The partial didcharge( PD) and dielectric loss experiments are performed in order to obtain the nondestructive parameters, and the residual breakdown voltage acquired by AC damage experiment. In order to eliminate the dimension effect on measurement data, raw data is preprocessed by centered-compress. Based on the idea of extracting principal components, a partial least square (PLS) method is applied to screen and synthesize correlation information between the nondestructive parameters and residual breakdown voltage easily. Moreover, various data message about condition parameters are also discussed. Results Graphical analysis function of PLS is easily to understand various data message of the stator bars condition parameters. The analysis Results are consistent with result of aging testing. Conclusion The method can select and extract PLS components of condition parameters from sample data, and the problems of less samples and multicollinearity are solved effectively in regression analysis.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
Lee, Kyungbook; Song, Seok Goo
2016-10-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events (M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Nonparametric estimation of population density for line transect sampling using FOURIER series
Crain, B.R.; Burnham, K.P.; Anderson, D.R.; Lake, J.L.
1979-01-01
A nonparametric, robust density estimation method is explored for the analysis of right-angle distances from a transect line to the objects sighted. The method is based on the FOURIER series expansion of a probability density function over an interval. With only mild assumptions, a general population density estimator of wide applicability is obtained.
Lin, Q.; Wang, Y.; Song, C.
2016-12-01
The Newmark displacement model has been used to predict earthquake-triggered landslides. Logistic regression (LR) is also a common landslide hazard assessment method. We combined the Newmark displacement model and LR and applied them to Wenchuan County and Beichuan County in China, which were affected by the Ms.8.0 Wenchuan earthquake on May 12th, 2008, to develop a mechanism-based landslide occurrence probability model and improve the predictive accuracy. A total of 1904 landslide sites in Wenchuan County and 3800 random non-landslide sites were selected as the training dataset. We applied the Newmark model and obtained the distribution of permanent displacement (Dn) for a 30 × 30 m grid. Four factors (Dn, topographic relief, and distances to drainages and roads) were used as independent variables for LR. Then, a combined model was obtained, with an AUC (area under the curve) value of 0.797 for Wenchuan County. A total of 617 landslide sites and non-landslide sites in Beichuan County were used as a validation dataset with AUC = 0.753. The proposed method may also be applied to earthquake-induced landslides in other regions.
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Preliminary results on nonparametric facial occlusion detection
Directory of Open Access Journals (Sweden)
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Directory of Open Access Journals (Sweden)
Weiß, Verena
2015-10-01
Full Text Available Introduction: For survival data the coefficient of determination cannot be used to describe how good a model fits to the data. Therefore, several measures of explained variation for survival data have been proposed in recent years.Methods: We analyse an existing measure of explained variation with regard to minimisation aspects and demonstrate that these are not fulfilled for the measure.Results: In analogy to the least squares method from linear regression analysis we develop a novel measure for categorical covariates which is based only on the Kaplan-Meier estimator. Hence, the novel measure is a completely nonparametric measure with an easy graphical interpretation. For the novel measure different weighting possibilities are available and a statistical test of significance can be performed. Eventually, we apply the novel measure and further measures of explained variation to a dataset comprising persons with a histopathological papillary thyroid carcinoma.Conclusion: We propose a novel measure of explained variation with a comprehensible derivation as well as a graphical interpretation, which may be used in further analyses with survival data.
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Clement, Dominic; Gruber, Nicolas
2017-04-01
Major progress has been made by the international community (e.g., GO-SHIP, IOCCP, IMBER/SOLAS carbon working groups) in recent years by collecting and providing homogenized datasets for carbon and other biogeochemical variables in the surface ocean (SOCAT) and interior ocean (GLODAPv2). Together with previous efforts, this has enabled the community to develop methods to assess changes in the ocean carbon cycle through time. Of particular interest is the determination of the decadal change in the anthropogenic CO2 inventory solely based on in-situ measurements from at least two time periods in the interior ocean. However, all such methods face the difficulty of a scarce dataset in both space and time, making the use of appropriate interpolation techniques in time and space a crucial element of any method. Here we present a new method based on the parameter C*, whose variations reflect the total change in dissolved inorganic carbon (DIC) driven by the exchange of CO2 across the air-sea interface. We apply the extended Multiple Linear Regression method (Friis et al., 2005) on C* in order (1) to calculate the change in anthropogenic CO2 from the original DIC/C* measurements, and (2) to interpolate the result onto a spatial grid using other biogeochemical variables (T,S,AOU, etc.). These calculations are made on isopycnal slabs across whole ocean basins. In combination with the transient steady state assumption (Tanhua et al., 2007) providing a temporal correction factor, we address the spatial and temporal interpolation challenges. Using synthetic data from a hindcast simulation with a global ocean biogeochemistry model (NCAR-CCSM with BEC), we tested the method for robustness and accuracy in determining ΔCant. We will present data-based results for all ocean basins, with the most recent estimate of an global uptake of 32±6 Pg C between 1994 and 2007, indicating an uptake rate 2.5±0.5 Pg C yr-1 for this time period. These results are compared with regional and
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Nick, Todd G; Campbell, Kathleen M
2007-01-01
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Non-parametric seismic hazard analysis in the presence of incomplete data
Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush
2017-01-01
The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees
Directory of Open Access Journals (Sweden)
Savic Vladimir
2010-01-01
Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
DEFF Research Database (Denmark)
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Ismail, B; Anil, Manjula
2014-01-01
With modernization, rapid urbanization and industrialization, the price that the society is paying is tremendous load of "Non-Communicable" diseases, referred to as "Lifestyle Diseases". Coronary artery disease (CAD), one of the lifestyle diseases that manifests at a younger age can have divesting consequences for an individual, the family and society. Prevention of these diseases can be done by studying the risk factors, analyzing and interpreting them using various statistical methods. To determine, using logistic regression the relative contribution of independent variables according to the intensity of their influence (proven by statistical significance) upon the occurrence of values of the dependent cardio vascular risk scores. Additionally, we wanted to assess whether non parametric smoothing of the cardio vascular risk scores can be used as a better statistical method as compared to the existing methods. The study includes 498 students in the age group of 18-29 years. Prevalence of over weight (BMI 23-25 kg/m(2)) and obesity (BMI > 25 Kg/m(2)) was found among individuals of 22 years and above. Non smokers had decreased odds (OR = 0.041, CI = 0.015-0.107) and also increase in LDL Cholesterol (OR = 1.05, CI = 1.021-1.055) and BMI (OR = 1.42, CI = 1.244-1.631) were significantly contributing towards the risk of CVD. Localite students had decreased odds of developing CVD in the next 10 years (OR = 0.27, CI = 0.092-0.799) as compared to students residing in hostel or paying guests. Copyright © 2014 Cardiological Society of India. Published by Elsevier B.V. All rights reserved.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Scaled Sparse Linear Regression
Sun, Tingni
2011-01-01
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual squares and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs nearly nothing beyond the computation of a path of the sparse regression estimator for penalty levels above a threshold. For the scaled Lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the method yields simultaneously an estimator for the noise level and an estimated coefficient vector in the Lasso path satisfying certain oracle inequalities for the estimation of the noise level, prediction, and the estimation of regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic...
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P Poultry Science Association Inc.
Energy Technology Data Exchange (ETDEWEB)
Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan [Dept. of Biosystems Machinery Engineering, Chungnam National University, Daejeon (Korea, Republic of); Kim, Moon Sung [United States Department of Agriculture Agricultural Research Service, Washington (United States); Lee, Soo Hee [Life and Technology Co.,Ltd., Hwasung (Korea, Republic of)
2014-08-15
This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R{sup 2}{sub c} and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.
Institute of Scientific and Technical Information of China (English)
吕世瑜; 刘北上; 邱菀华
2011-01-01
Based on the polynomial option pricing model, a multi - stage nonparametric real option model for venture capital evaluation is established by introducing the minimum relative entropy theory. Empirical analysis shows that the model can effectively reduce the subjective impact since it helps us to draw the conclusion in light of the information collection of risk project rather than parameter hypothesis, which is a standard way for most of the pricing models proposed in the past.%本文在Copeland等人提出的多项式期权定价模型的基础上,通过引入最小相对熵原理,建立了多阶段风险投资非参实物期权决策模型,解决了多阶段风险投资估值决策的问题.实证表明,该模型能够使风险项目决策建立在信息收集的基础上,大大减少了参数假设等主观因素的影响,提高了模型的实用性.
van Teunenbroek, A; Stijnen, T; Otten, B; de Muinck Keizer-Schrama, S; Naeraa, R W; Rongen-Westerlaken, C; Drop, S
1996-04-01
A total of 235 measurement points of 57 Dutch women with Turner's syndrome (TS), including women with spontaneous menarche and oestrogen treatment, served to develop a new Turner-specific final height (FH) prediction method (PTS). Analogous to the Tanner and Whitehouse mark 2 method (TW) for normal children, smoothed regression coefficients are tabulated for PTS for height (H), chronological age (CA) and bone age (BA), both TW RUS and Greulich and Pyle (GP). Comparison between all methods on 40 measurement points of 21 Danish TS women showed small mean prediction errors (predicted minus observed FH) and corresponding standard deviation (ESD) of both PTSRUS and PTSGP, in particular at the "younger" ages. Comparison between existing methods on the Dutch data indicated a tendency to overpredict FH. Before the CA of 9 years the mean prediction errors of the Bayley and Pinneau and TW methods were markedly higher compared with the other methods. Overall, the simplest methods--projected height (PAH) and its modification (mPAH)--were remarkably good at most ages. Although the validity of PTSRUS and PTSGP remains to be tested below the age of 6 years, both gave small mean prediction errors and a high accuracy. FH prediction in TS is important in the consideration of growth-promoting therapy or in the evaluation of its effects.
Mandal, Sohom; Srivastav, Roshan K.; Simonovic, Slobodan P.
2016-07-01
Impacts of global climate change on water resources systems are assessed by downscaling coarse scale climate variables into regional scale hydro-climate variables. In this study, a new multisite statistical downscaling method based on beta regression (BR) is developed for generating synthetic precipitation series, which can preserve temporal and spatial dependence along with other historical statistics. The beta regression based downscaling method includes two main steps: (1) prediction of precipitation states for the study area using classification and regression trees, and (2) generation of precipitation at different stations in the study area conditioned on the precipitation states. Daily precipitation data for 53 years from the ANUSPLIN data set is used to predict precipitation states of the study area where predictor variables are extracted from the NCEP/NCAR reanalysis data set for the same interval. The proposed model is applied to downscaling daily precipitation at ten different stations in the Campbell River basin, British Columbia, Canada. Results show that the proposed downscaling model can capture spatial and temporal variability of local precipitation very well at various locations. The performance of the model is compared with a recently developed non-parametric kernel regression based downscaling model. The BR model performs better regarding extrapolation compared to the non-parametric kernel regression model. Future precipitation changes under different GHG (greenhouse gas) emission scenarios also projected with the developed downscaling model that reveals a significant amount of changes in future seasonal precipitation and number of wet days in the river basin.
Analyzing multiple spike trains with nonparametric Granger causality.
Nedungadi, Aatira G; Rangarajan, Govindan; Jain, Neeraj; Ding, Mingzhou
2009-08-01
Simultaneous recordings of spike trains from multiple single neurons are becoming commonplace. Understanding the interaction patterns among these spike trains remains a key research area. A question of interest is the evaluation of information flow between neurons through the analysis of whether one spike train exerts causal influence on another. For continuous-valued time series data, Granger causality has proven an effective method for this purpose. However, the basis for Granger causality estimation is autoregressive data modeling, which is not directly applicable to spike trains. Various filtering options distort the properties of spike trains as point processes. Here we propose a new nonparametric approach to estimate Granger causality directly from the Fourier transforms of spike train data. We validate the method on synthetic spike trains generated by model networks of neurons with known connectivity patterns and then apply it to neurons simultaneously recorded from the thalamus and the primary somatosensory cortex of a squirrel monkey undergoing tactile stimulation.
Institute of Scientific and Technical Information of China (English)
Guijun YANG; Lu LIN; Runchu ZHANG
2007-01-01
Quasi-regression, motivated by the problems arising in the computer experiments, focuses mainly on speeding up evaluation. However, its theoretical properties are unexplored systemically. This paper shows that quasi-regression is unbiased, strong convergent and asymptotic normal for parameter estimations but it is biased for the fitting of curve. Furthermore, a new method called unbiased quasi-regression is proposed. In addition to retaining the above asymptotic behaviors of parameter estimations, unbiased quasi-regression is unbiased for the fitting of curve.
Nonparametric estimation of the stationary M/G/1 workload distribution function
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted
2005-01-01
In this paper it is demonstrated how a nonparametric estimator of the stationary workload distribution function of the M/G/1-queue can be obtained by systematic sampling the workload process. Weak convergence results and bootstrap methods for empirical distribution functions for stationary associ...
Non-parametric system identification from non-linear stochastic response
DEFF Research Database (Denmark)
Rüdinger, Finn; Krenk, Steen
2001-01-01
An estimation method is proposed for identification of non-linear stiffness and damping of single-degree-of-freedom systems under stationary white noise excitation. Non-parametric estimates of the stiffness and damping along with an estimate of the white noise intensity are obtained by suitable p...
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.
Fan, Jianqing; Ma, Yunbei; Dai, Wei
2014-01-01
The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.
Maroco, João; Silva, Dina; Rodrigues, Ana; Guerreiro, Manuela; Santana, Isabel; de Mendonça, Alexandre
2011-08-17
Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Press' Q test showed that all classifiers performed better than chance alone (p Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a
Martina, R; Kay, R; van Maanen, R; Ridder, A
2015-01-01
Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well.
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
On Parametric (and Non-Parametric Variation
Directory of Open Access Journals (Sweden)
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
Institute of Scientific and Technical Information of China (English)
方杰; 张敏强
2012-01-01
针对中介效应ab的抽样分布往往不是正态分布的问题,学者近年提出了三类无需对ab的抽样分布进行任何限制且适用于中、小样本的方法,包括乘积分布法、非参数Bootstrap和马尔科夫链蒙特卡罗(MCMC)方法.采用模拟技术比较了三类方法在中介效应分析中的表现.结果发现:1)有先验信息的MCMC方法的ab点估计最准确;2)有先验信息的MCMC方法的统计功效最高,但付出了低估第Ⅰ类错误率的代价,偏差校正的非参数百分位Bootstrap方法的统计功效其次,但付出了高估第Ⅰ类错误率的代价;3)有先验信息的MCMC方法的中介效应区间估计最准确.结果表明,当有先验信息时,推荐使用有先验信息的MCMC方法;当先验信息不可得时,推荐使用偏差校正的非参数百分位Bootstrap方法.%Because few sampling distributions of mediating effect are normally distributed, in recent years, Classic approaches to assessing mediation (Baron & Kenny, 1986; Sobel, 1982) have been supplemented by computationally intensive methods such as nonparametric bootstrap, the distribution of the product methods, and Markov chain Monte Carlo (MCMC) methods. These approaches are suitable for medium or small sample size and do not impose the assumption of normality of the sampling distribution of mediating effects. However, little is known about how these methods perform relative to each other.This study extends Mackinnon and colleagues' (Mackinnon, Lockwood & Williams, 2004; Yuan & Mackinnon, 2009) works by conducting a simulation using R software. This simulation examines several approaches for assessing mediation. Three factors were considered in the simulation design: (a) sample size (N=25, 50, 100, 200, 1000); (b) parameter combinations (a=b=0, a=0.39 b=0, a=0 b=0.59, a=b=0.14, a=b=0.39, a=b=0.59); ? method for assessing mediation (distribute of the product method, nonparametric percentile Bootstrap method, bias-corrected nonparametric
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
Nonparametric reconstruction of the Om diagnostic to test LCDM
Escamilla-Rivera, Celia
2015-01-01
Cosmic acceleration is usually related with the unknown dark energy, which equation of state, w(z), is constrained and numerically confronted with independent astrophysical data. In order to make a diagnostic of w(z), the introduction of a null test of dark energy can be done using a diagnostic function of redshift, Om. In this work we present a nonparametric reconstruction of this diagnostic using the so-called Loess-Simex factory to test the concordance model with the advantage that this approach offers an alternative way to relax the use of priors and find a possible 'w' that reliably describe the data with no previous knowledge of a cosmological model. Our results demonstrate that the method applied to the dynamical Om diagnostic finds a preference for a dark energy model with equation of state w =-2/3, which correspond to a static domain wall network.
Directory of Open Access Journals (Sweden)
Semra Boran
2007-09-01
Full Text Available Taguchi Method and Regression Analysis have wide spread applications in statistical researches. It can be said that Taguchi Method is one of the most frequently used method especially in optimization problems. But applications of this method are not common in food industry . In this study, optimal operating parameters were determined for industrial size fluidized bed dryer by using Taguchi method. Then the effects of operating parameters on activity value (the quality chracteristic of this problem were calculated by regression analysis. Finally, results of two methods were compared.To summarise, average activity value was found to be 660 for the 400 kg loading and average drying time 26 minutes by using the factors and levels taken from application of Taguchi Method. Whereas, in normal conditions (with 600 kg loading average activity value was found to be 630 and drying time 28 minutes. Taguchi Method application caused 15 % rise in activity value.
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Pivotal Estimation of Nonparametric Functions via Square-root Lasso
Belloni, Alexandre; Wang, Lie
2011-01-01
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter $\\sigma$ of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, $\\ell_1$-rate of converge, $\\ell_\\infty$-rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild con...
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-04-01
first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.
Institute of Scientific and Technical Information of China (English)
刘新乐
2016-01-01
缺失数据模型问题和纵向数据模型问题一直是统计学的热点之一，但对于纵向数据缺失情况的模型研究较少。本文针对纵向数据缺失情况提出了缺失纵向数据下的半参数回归模型，使用CC（Complete－Case）方法将所有含数据缺失的项删除，仅对余下的“完全”样本按二阶段估计的方法进行统计推断，得到了参数向量和非参数向量的二阶段估计的最终估计βr＾和gr （＾t），并证明这些估计量满足渐近正态性质。并且通过数据模拟形式说明了这个估计方法的可行性。%The issues of the missing data model and the longitudinal data model have been one of the hotspots of the statistics,but the study of the model of missing longitudinal data is very few.The semi-parametric re-gression model of missing longitudinal data is proposed in this thesis and the solutions is given:For missing longitudinal data,all items will be deleted in this thesis which contains lossing data using the CC method,and only remaining“full”sample.By the second stage estination method for statistical inference,the ultimate esti-mates of parametric and nonparametric vector are got by using the two stages estimate.And the asymptotic nor-mal properties of these estimators is proved.And the data simulation shows that the estimation method is feasi-ble.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2017-07-26
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R(2)), using R(2) as the primary metric of assay agreement. However, the use of R(2) alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Change-point estimation for censored regression model
Institute of Scientific and Technical Information of China (English)
Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO
2007-01-01
In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.