Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
Bayes and empirical Bayes: do they merge?
Petrone, Sonia; Scricciolo, Catia
2012-01-01
Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an {\\em empirical Bayes} approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the sample size is large, empirical Bayes leads to "similar" inferential answers. Yet, precise mathematical results seem to be missing. In this work, we give a more rigorous justification in terms of merging of Bayes and empirical Bayes posterior distributions. We consider two notions of merging: Bayesian weak merging and frequentist merging in total variation. Since weak merging is related to consistency, we provide sufficient conditions for consistency of empirical Bayes posteriors. Also, we show that, under regularity conditions, the empirical Bayes procedure asymptotically selects the value of the hyperparameter for ...
Empirical Bayes analysis of single nucleotide polymorphisms
Ickstadt Katja
2008-03-01
Full Text Available Abstract Background An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of thousands of SNPs, methods are required that can cope with the corresponding multiple testing problem. For the analysis of gene expression data, approaches such as the empirical Bayes analysis of microarrays have been developed particularly for the detection of genes associated with the response. However, the empirical Bayes analysis of microarrays has only been suggested for binary responses when considering expression values, i.e. continuous predictors. Results In this paper, we propose a modification of this empirical Bayes analysis that can be used to analyze high-dimensional categorical SNP data. This approach along with a generalized version of the original empirical Bayes method are available in the R package siggenes version 1.10.0 and later that can be downloaded from http://www.bioconductor.org. Conclusion As applications to two subsets of the HapMap data show, the empirical Bayes analysis of microarrays cannot only be used to analyze continuous gene expression data, but also be applied to categorical SNP data, where the response is not restricted to be binary. In association studies in which typically several ten to a few hundred SNPs are considered, our approach can furthermore be employed to test interactions of SNPs. Moreover, the posterior probabilities resulting from the empirical Bayes analysis of (prespecified interactions/genotypes can also be used to quantify the importance of these interactions.
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Sparse Empirical Bayes Analysis (SEBA)
Bochkina, Natalia
2009-01-01
We consider a joint processing of $n$ independent sparse regression problems. Each is based on a sample $(y_{i1},x_{i1})...,(y_{im},x_{im})$ of $m$ \\iid observations from $y_{i1}=x_{i1}\\t\\beta_i+\\eps_{i1}$, $y_{i1}\\in \\R$, $x_{i 1}\\in\\R^p$, $i=1,...,n$, and $\\eps_{i1}\\dist N(0,\\sig^2)$, say. $p$ is large enough so that the empirical risk minimizer is not consistent. We consider three possible extensions of the lasso estimator to deal with this problem, the lassoes, the group lasso and the RING lasso, each utilizing a different assumption how these problems are related. For each estimator we give a Bayesian interpretation, and we present both persistency analysis and non-asymptotic error bounds based on restricted eigenvalue - type assumptions.
Noma, Hisashi; Matsui, Shigeyuki
2013-05-20
The main purpose of microarray studies is screening of differentially expressed genes as candidates for further investigation. Because of limited resources in this stage, prioritizing genes are relevant statistical tasks in microarray studies. For effective gene selections, parametric empirical Bayes methods for ranking and selection of genes with largest effect sizes have been proposed (Noma et al., 2010; Biostatistics 11: 281-289). The hierarchical mixture model incorporates the differential and non-differential components and allows information borrowing across differential genes with separation from nuisance, non-differential genes. In this article, we develop empirical Bayes ranking methods via a semiparametric hierarchical mixture model. A nonparametric prior distribution, rather than parametric prior distributions, for effect sizes is specified and estimated using the "smoothing by roughening" approach of Laird and Louis (1991; Computational statistics and data analysis 12: 27-37). We present applications to childhood and infant leukemia clinical studies with microarrays for exploring genes related to prognosis or disease progression.
Bayes and empirical Bayes iteration estimators in two seemingly unrelated regression equations
WANG; Lichun
2005-01-01
For a system of two seemingly unrelated regression equations given by {y1=X1β+ε1,y2=X2γ+ε2, (y1 is an m × 1 vector and y2 is an n × 1 vector, m≠ n), employing the covariance adjusted technique, we propose the parametric Bayes and empirical Bayes iteration estimator sequences for regression coefficients. We prove that both the covariance matrices converge monotonically and the Bayes iteration estimator squence is consistent as well. Based on the mean square error (MSE) criterion, we elaborate the superiority of empirical Bayes iteration estimator over the Bayes estimator of single equation when the covariance matrix of errors is unknown. The results obtained in this paper further show the power of the covariance adjusted approach.
Empirical Bayes Estimation in the Rasch Model: A Simulation.
de Gruijter, Dato N. M.
In a situation where the population distribution of latent trait scores can be estimated, the ordinary maximum likelihood estimator of latent trait scores may be improved upon by taking the estimated population distribution into account. In this paper empirical Bayes estimators are compared with the liklihood estimator for three samples of 300…
Empirical Bayes Model Comparisons for Differential Methylation Analysis
Mingxiang Teng
2012-01-01
Full Text Available A number of empirical Bayes models (each with different statistical distribution assumptions have now been developed to analyze differential DNA methylation using high-density oligonucleotide tiling arrays. However, it remains unclear which model performs best. For example, for analysis of differentially methylated regions for conservative and functional sequence characteristics (e.g., enrichment of transcription factor-binding sites (TFBSs, the sensitivity of such analyses, using various empirical Bayes models, remains unclear. In this paper, five empirical Bayes models were constructed, based on either a gamma distribution or a log-normal distribution, for the identification of differential methylated loci and their cell division—(1, 3, and 5 and drug-treatment-(cisplatin dependent methylation patterns. While differential methylation patterns generated by log-normal models were enriched with numerous TFBSs, we observed almost no TFBS-enriched sequences using gamma assumption models. Statistical and biological results suggest log-normal, rather than gamma, empirical Bayes model distribution to be a highly accurate and precise method for differential methylation microarray analysis. In addition, we presented one of the log-normal models for differential methylation analysis and tested its reproducibility by simulation study. We believe this research to be the first extensive comparison of statistical modeling for the analysis of differential DNA methylation, an important biological phenomenon that precisely regulates gene transcription.
A Numerical Empirical Bayes Procedure for Finding an Interval Estimate.
Lord, Frederic M.
A numerical procedure is outlined for obtaining an interval estimate of a parameter in an empirical Bayes estimation problem. The case where each observed value x has a binomial distribution, conditional on a parameter zeta, is the only case considered. For each x, the parameter estimated is the expected value of zeta given x. The main purpose is…
1985-09-01
XW Dr. Douglas de Priest Statistics & Probability Program Code 411(SP) Office of Naval Research Arlington, VA 22217 Dr. Morris DeGroot Statistics...consult Morris (1983) for a review of parametric empirical Bayes methods, Robbins (1983) and in much previous work, has elucidated the non-parametric...South Street Morris Township, NJ 07960 Dr. David L. Wallace Statistics Dept. University of Chicago 5734 S. University Ave. Chicago, IL 60637 Dr. F
Bayes and empirical Bayes estimators of abundance and density from spatial capture-recapture data.
Dorazio, Robert M
2013-01-01
In capture-recapture and mark-resight surveys, movements of individuals both within and between sampling periods can alter the susceptibility of individuals to detection over the region of sampling. In these circumstances spatially explicit capture-recapture (SECR) models, which incorporate the observed locations of individuals, allow population density and abundance to be estimated while accounting for differences in detectability of individuals. In this paper I propose two Bayesian SECR models, one for the analysis of recaptures observed in trapping arrays and another for the analysis of recaptures observed in area searches. In formulating these models I used distinct submodels to specify the distribution of individual home-range centers and the observable recaptures associated with these individuals. This separation of ecological and observational processes allowed me to derive a formal connection between Bayes and empirical Bayes estimators of population abundance that has not been established previously. I showed that this connection applies to every Poisson point-process model of SECR data and provides theoretical support for a previously proposed estimator of abundance based on recaptures in trapping arrays. To illustrate results of both classical and Bayesian methods of analysis, I compared Bayes and empirical Bayes esimates of abundance and density using recaptures from simulated and real populations of animals. Real populations included two iconic datasets: recaptures of tigers detected in camera-trap surveys and recaptures of lizards detected in area-search surveys. In the datasets I analyzed, classical and Bayesian methods provided similar - and often identical - inferences, which is not surprising given the sample sizes and the noninformative priors used in the analyses.
An empirical Bayes approach to network recovery using external knowledge.
Kpogbezan, Gino B; van der Vaart, Aad W; van Wieringen, Wessel N; Leday, Gwenaël G R; van de Wiel, Mark A
2017-09-01
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bayesian model reduction and empirical Bayes for group (DCM) studies.
Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter
2016-03-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction.
Lin, Miao-Hsiang; Hsiung, Chao A.
1994-01-01
Two simple empirical approximate Bayes estimators are introduced for estimating domain scores under binomial and hypergeometric distributions respectively. Criteria are established regarding use of these functions over maximum likelihood estimation counterparts. (SLD)
Maydeu-Olivares, Albert
2005-01-01
Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in…
THE SUPERIORITY OF EMPIRICAL BAYES ESTIMATION OF PARAMETERS IN PARTITIONED NORMAL LINEAR MODEL
Zhang Weiping; Wei Laisheng
2008-01-01
In this article, the empirical Bayes (EB) estimators are constructed for the estimable functions of the parameters in partitioned normal linear model. The superiorities of the EB estimators over ordinary least-squares (LS) estimator are investigated under mean square error matrix (MSEM) criterion.
Empirical Baye's Method and Roper Resonance State in Very Light Quark Region
董绍静; 张剑波; 应和平
2003-01-01
Based on the Bayesian theorem, an empirical Baye method is discussed. A programming chart for mass spectrum fitting is suggested. Using this method to extract the spectrum, we observe the puzzled Roper resonance state on a 163 × 28 lattice with overlap fermions at very light quark region.
An Evaluation of Empirical Bayes's Estimation of Value-Added Teacher Performance Measures
Guarino, Cassandra M.; Maxfield, Michelle; Reckase, Mark D.; Thompson, Paul N.; Wooldridge, Jeffrey M.
2015-01-01
Empirical Bayes's (EB) estimation has become a popular procedure used to calculate teacher value added, often as a way to make imprecise estimates more reliable. In this article, we review the theory of EB estimation and use simulated and real student achievement data to study the ability of EB estimators to properly rank teachers. We compare the…
Guarino, Cassandra M.; Maxfield, Michelle; Reckase, Mark D.; Thompson, Paul; Wooldridge, Jeffrey M.
2014-01-01
Empirical Bayes' (EB) estimation is a widely used procedure to calculate teacher value-added. It is primarily viewed as a way to make imprecise estimates more reliable. In this paper we review the theory of EB estimation and use simulated data to study its ability to properly rank teachers. We compare the performance of EB estimators with that of…
Empirical Bayes Test for the Parameter of Rayleigh Distribution with Error of Measurement
HUANG JUAN
2011-01-01
For the data with error of measurement in historical samples,the empirical Bayes test rule for the parameter of Rayleigh distribution is constructed,and the asymptotically optimal property is obtained.It is shown that the convergence rate of the proposed EB test rule can be arbitrarily close to O(n-1/2) under suitable conditions.
Maydeu-Olivares, Albert
2005-04-01
Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in their data. To verify this conjecture, we compare the fit of these models to the Social Problem Solving Inventory-Revised, whose scales were designed to be unidimensional. A calibration and a cross-validation sample of new observations were used. We also included the following parametric models in the comparison: Bock's nominal model, Masters' partial credit model, and Thissen and Steinberg's extension of the latter. All models were estimated using full information maximum likelihood. We also included in the comparison a normal ogive model version of Samejima's model estimated using limited information estimation. We found that for all scales Samejima's model outperformed all other parametric IRT models in both samples, regardless of the estimation method employed. The non-parametric model outperformed all parametric models in the calibration sample. However, the graded model outperformed MFS in the cross-validation sample in some of the scales. We advocate employing the graded model estimated using limited information methods in modeling Likert-type data, as these methods are more versatile than full information methods to capture the multidimensionality that is generally present in personality data.
Ecological Forecasting in Chesapeake Bay: Using a Mechanistic-Empirical Modelling Approach
Brown, C. W.; Hood, Raleigh R.; Long, Wen; Jacobs, John M.; Ramers, D. L.; Wazniak, C.; Wiggert, J. D.; Wood, R.; Xu, J.
2013-09-01
The Chesapeake Bay Ecological Prediction System (CBEPS) automatically generates daily nowcasts and three-day forecasts of several environmental variables, such as sea-surface temperature and salinity, the concentrations of chlorophyll, nitrate, and dissolved oxygen, and the likelihood of encountering several noxious species, including harmful algal blooms and water-borne pathogens, for the purpose of monitoring the Bay's ecosystem. While the physical and biogeochemical variables are forecast mechanistically using the Regional Ocean Modeling System configured for the Chesapeake Bay, the species predictions are generated using a novel mechanistic empirical approach, whereby real-time output from the coupled physical biogeochemical model drives multivariate empirical habitat models of the target species. The predictions, in the form of digital images, are available via the World Wide Web to interested groups to guide recreational, management, and research activities. Though full validation of the integrated forecasts for all species is still a work in progress, we argue that the mechanistic–empirical approach can be used to generate a wide variety of short-term ecological forecasts, and that it can be applied in any marine system where sufficient data exist to develop empirical habitat models. This paper provides an overview of this system, its predictions, and the approach taken.
Empirical Bayes Credibility Models for Economic Catastrophic Losses by Regions
Jindrová Pavla
2017-01-01
Full Text Available Catastrophic events affect various regions of the world with increasing frequency and intensity. The number of catastrophic events and the amount of economic losses is varying in different world regions. Part of these losses is covered by insurance. Catastrophe events in last years are associated with increases in premiums for some lines of business. The article focus on estimating the amount of net premiums that would be needed to cover the total or insured catastrophic losses in different world regions using Bühlmann and Bühlmann-Straub empirical credibility models based on data from Sigma Swiss Re 2010-2016. The empirical credibility models have been developed to estimate insurance premiums for short term insurance contracts using two ingredients: past data from the risk itself and collateral data from other sources considered to be relevant. In this article we deal with application of these models based on the real data about number of catastrophic events and about the total economic and insured catastrophe losses in seven regions of the world in time period 2009-2015. Estimated credible premiums by world regions provide information how much money in the monitored regions will be need to cover total and insured catastrophic losses in next year.
Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.
McCallum, Kenneth J; Ionita-Laza, Iuliana
2015-12-01
Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes.
Kearns, Jack
Empirical Bayes point estimates of true score may be obtained if the distribution of observed score for a fixed examinee is approximated in one of several ways by a well-known compound binomial model. The Bayes estimates of true score may be expressed in terms of the observed score distribution and the distribution of a hypothetical binomial test.…
Engine gearbox fault diagnosis using empirical mode decomposition method and Naıve Bayes algorithm
KIRAN VERNEKAR; HEMANTHA KUMAR; K V GANGADHARAN
2017-07-01
This paper presents engine gearbox fault diagnosis based on empirical mode decomposition (EMD) and Naı¨ve Bayes algorithm. In this study, vibration signals from a gear box are acquired with healthy and different simulated faulty conditions of gear and bearing. The vibration signals are decomposed into a finite number of intrinsic mode functions using the EMD method. Decision tree technique (J48 algorithm) is used for important feature selection out of extracted features. Naı¨ve Bayes algorithm is applied as a fault classifier to know the status of an engine. The experimental result (classification accuracy 98.88%) demonstrates that the proposed approach is an effective method for engine fault diagnosis.
Hill, Steven M; Neve, Richard M; Bayani, Nora; Kuo, Wen-Lin; Ziyad, Safiyyah; Spellman, Paul T; Gray, Joe W; Mukherjee, Sach
2012-05-11
An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.
Sundara, Vinny Yuliani; Sadik, Kusman; Kurnia, Anang
2017-03-01
Survey is one of data collection method which sampling of individual units from a population. However, national survey only provides limited information which impacts on low precision in small area level. In fact, when the area is not selected as sample unit, estimation cannot be made. Therefore, small area estimation method is required to solve this problem. One of model-based estimation methods is empirical Bayes which has been widely used to estimate parameter in small area, even in non-sampled area. Yet, problems occur when this method is used to estimate parameter of non-sampled area which is solely based on synthetic model which ignore the area effects. This paper proposed an approach to cluster area effects of auxiliary variable by assuming that there are similar among particular area. Direct estimates in several sub-districts in regency and city of Bogor are zero because no household which are under poverty in the sample that selected from these sub-districts. Empirical Bayes method is used to get the estimates are not zero. Empirical Bayes method on FGT poverty measures both Molina & Rao and information clusters have the same estimates in the sub-districts selected as samples, but have different estimates on non-sampled sub-districts. Empirical Bayes methods with information cluster has smaller coefficient of variation. Empirical Bayes method with cluster information is better than empirical Bayes methods without cluster information on non-sampled sub-districts in regency and city of Bogor in terms of coefficient of variation.
β-empirical Bayes inference and model diagnosis of microarray data
Hossain Mollah Mohammad
2012-06-01
Full Text Available Abstract Background Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, the data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models. Results As an extension of empirical Bayes (EB procedures, we have developed the β-empirical Bayes (β-EB approach based on a β-likelihood measure which can be regarded as an ’evidence-based’ weighted (quasi- likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ. Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β0-likelihood by cross-validation. The proposed β-EB approach identified six significant (p−5 contaminated transcripts as differentially expressed (DE in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach. Conclusions The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model
Kuusela, Mikael
2015-01-01
We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inh...
Cloern, J.E.
1978-01-01
An empirical model of Skeletonema costatum photosynthetic rate is developed and fit to measurements of photosynthesis selected from the literature. Because the model acknowledges existence of: 1) a light-temperature interaction (by allowing optimum irradiance to vary with temperature), 2) light inhibition, 3) temperature inhibition, and 4) a salinity effect, it accurately estimates photosynthetic rates measured over a wide range of temperature, light intensity, and salinity. Integration of predicted instantaneous rate of photosynthesis with time and depth yields daily net carbon assimilation (pg C cell-1 day-1) in a mixed layer of specified depth, when salinity, temperature, daily irradiance and extinction coefficient are known. The assumption of constant carbon quota (pg C cell-1) allows for prediction of mean specific growth rate (day-1), which can be used in numerical models of Skeletonema costatum population dynamics. Application of the model to northern San Francisco Bay clearly demonstrates the limitation of growth by low light availability, and suggests that large population densities of S. costatum observed during summer months are not the result of active growth in the central deep channels (where growth rates are consistently predicted to be negative). But predicted growth rates in the lateral shallows are positive during summer and fall, thus offering a testable hypothesis that shoals are the only sites of active population growth by S. costatum (and perhaps other neritic diatoms) in the northern reach of San Francisco Bay. ?? 1978.
Kim Seongho
2011-10-01
Full Text Available Abstract Background Mass spectrometry (MS based metabolite profiling has been increasingly popular for scientific and biomedical studies, primarily due to recent technological development such as comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GCxGC/TOF-MS. Nevertheless, the identifications of metabolites from complex samples are subject to errors. Statistical/computational approaches to improve the accuracy of the identifications and false positive estimate are in great need. We propose an empirical Bayes model which accounts for a competing score in addition to the similarity score to tackle this problem. The competition score characterizes the propensity of a candidate metabolite of being matched to some spectrum based on the metabolite's similarity score with other spectra in the library searched against. The competition score allows the model to properly assess the evidence on the presence/absence status of a metabolite based on whether or not the metabolite is matched to some sample spectrum. Results With a mixture of metabolite standards, we demonstrated that our method has better identification accuracy than other four existing methods. Moreover, our method has reliable false discovery rate estimate. We also applied our method to the data collected from the plasma of a rat and identified some metabolites from the plasma under the control of false discovery rate. Conclusions We developed an empirical Bayes model for metabolite identification and validated the method through a mixture of metabolite standards and rat plasma. The results show that our hierarchical model improves identification accuracy as compared with methods that do not structurally model the involved variables. The improvement in identification accuracy is likely to facilitate downstream analysis such as peak alignment and biomarker identification. Raw data and result matrices can be found at http
Shreif, Zeina; Striegel, Deborah A; Periwal, Vipul
2015-09-07
A nucleotide sequence 35 base pairs long can take 1,180,591,620,717,411,303,424 possible values. An example of systems biology datasets, protein binding microarrays, contain activity data from about 40,000 such sequences. The discrepancy between the number of possible configurations and the available activities is enormous. Thus, albeit that systems biology datasets are large in absolute terms, they oftentimes require methods developed for rare events due to the combinatorial increase in the number of possible configurations of biological systems. A plethora of techniques for handling large datasets, such as Empirical Bayes, or rare events, such as importance sampling, have been developed in the literature, but these cannot always be simultaneously utilized. Here we introduce a principled approach to Empirical Bayes based on importance sampling, information theory, and theoretical physics in the general context of sequence phenotype model induction. We present the analytical calculations that underlie our approach. We demonstrate the computational efficiency of the approach on concrete examples, and demonstrate its efficacy by applying the theory to publicly available protein binding microarray transcription factor datasets and to data on synthetic cAMP-regulated enhancer sequences. As further demonstrations, we find transcription factor binding motifs, predict the activity of new sequences and extract the locations of transcription factor binding sites. In summary, we present a novel method that is efficient (requiring minimal computational time and reasonable amounts of memory), has high predictive power that is comparable with that of models with hundreds of parameters, and has a limited number of optimized parameters, proportional to the sequence length.
Fidalgo, Angel M.; Hashimoto, Kanako; Bartram, Dave; Muniz, Jose
2007-01-01
In this study, the authors assess several strategies created on the basis of the Mantel-Haenszel (MH) procedure for conducting differential item functioning (DIF) analysis with small samples. One of the analytical strategies is a loss function (LF) that uses empirical Bayes Mantel-Haenszel estimators, whereas the other strategies use the classical…
Zwick, Rebecca; Thayer, Dorothy T.
This study investigated the applicability to computerized adaptive testing (CAT) data of a differential item functioning (DIF) analysis that involves an empirical Bayes (EB) enhancement of the popular Mantel Haenszel (MH) DIF analysis method. The computerized Law School Admission Test (LSAT) assumed for this study was similar to that currently…
Huynh, Huynh; Saunders, Joseph C., III
The Bayesian approach to setting passing scores, as proposed by Swaminathan, Hambleton, and Algina, is compared with the empirical Bayes approach to the same problem that is derived from Huynh's decision-theoretic framework. Comparisons are based on simulated data which follow an approximate beta-binomial distribution and on real test results from…
Stochl Jan
2012-06-01
Full Text Available Abstract Background Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Methods Scalability of data from 1 a cross-sectional health survey (the Scottish Health Education Population Survey and 2 a general population birth cohort study (the National Child Development Study illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. Results and conclusions After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items we show that all items from the 12-item General Health Questionnaire (GHQ-12 – when binary scored – were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech’s “well-being” and “distress” clinical scales. An illustration of ordinal item analysis
Stochl, Jan; Jones, Peter B; Croudace, Tim J
2012-06-11
Mokken scaling techniques are a useful tool for researchers who wish to construct unidimensional tests or use questionnaires that comprise multiple binary or polytomous items. The stochastic cumulative scaling model offered by this approach is ideally suited when the intention is to score an underlying latent trait by simple addition of the item response values. In our experience, the Mokken model appears to be less well-known than for example the (related) Rasch model, but is seeing increasing use in contemporary clinical research and public health. Mokken's method is a generalisation of Guttman scaling that can assist in the determination of the dimensionality of tests or scales, and enables consideration of reliability, without reliance on Cronbach's alpha. This paper provides a practical guide to the application and interpretation of this non-parametric item response theory method in empirical research with health and well-being questionnaires. Scalability of data from 1) a cross-sectional health survey (the Scottish Health Education Population Survey) and 2) a general population birth cohort study (the National Child Development Study) illustrate the method and modeling steps for dichotomous and polytomous items respectively. The questionnaire data analyzed comprise responses to the 12 item General Health Questionnaire, under the binary recoding recommended for screening applications, and the ordinal/polytomous responses to the Warwick-Edinburgh Mental Well-being Scale. After an initial analysis example in which we select items by phrasing (six positive versus six negatively worded items) we show that all items from the 12-item General Health Questionnaire (GHQ-12)--when binary scored--were scalable according to the double monotonicity model, in two short scales comprising six items each (Bech's "well-being" and "distress" clinical scales). An illustration of ordinal item analysis confirmed that all 14 positively worded items of the Warwick-Edinburgh Mental
Semi- and Nonparametric ARCH Processes
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Ying Chen; Shao-Jing Dong; Terrence Draper; Ivan Horvath; Keh-Fei Liu; Nilmani Mathur; Sonali Tamhankar; Cidambi Srinivasan; Frank X. Lee; Jianbo Zhang
2004-05-01
We introduce the ''Sequential Empirical Bayes Method'', an adaptive constrained-curve fitting procedure for extracting reliable priors. These are then used in standard augmented-{chi}{sup 2} fits on separate data. This better stabilizes fits to lattice QCD overlap-fermion data at very low quark mass where a priori values are not otherwise known. Lessons learned (including caveats limiting the scope of the method) from studying artificial data are presented. As an illustration, from local-local two-point correlation functions, we obtain masses and spectral weights for ground and first-excited states of the pion, give preliminary fits for the a{sub 0} where ghost states (a quenched artifact) must be dealt with, and elaborate on the details of fits of the Roper resonance and S{sub 11}(N{sup 1/2-}) previously presented elsewhere. The data are from overlap fermions on a quenched 16{sup 3} x 28 lattice with spatial size La = 3.2 fm and pion mass as low as {approx}180 MeV.
Importance of shrinkage in empirical bayes estimates for diagnostics: problems and solutions.
Savic, Radojka M; Karlsson, Mats O
2009-09-01
Empirical Bayes ("post hoc") estimates (EBEs) of etas provide modelers with diagnostics: the EBEs themselves, individual prediction (IPRED), and residual errors (individual weighted residual (IWRES)). When data are uninformative at the individual level, the EBE distribution will shrink towards zero (eta-shrinkage, quantified as 1-SD(eta (EBE))/omega), IPREDs towards the corresponding observations, and IWRES towards zero (epsilon-shrinkage, quantified as 1-SD(IWRES)). These diagnostics are widely used in pharmacokinetic (PK) pharmacodynamic (PD) modeling; we investigate here their usefulness in the presence of shrinkage. Datasets were simulated from a range of PK PD models, EBEs estimated in non-linear mixed effects modeling based on the true or a misspecified model, and desired diagnostics evaluated both qualitatively and quantitatively. Identified consequences of eta-shrinkage on EBE-based model diagnostics include non-normal and/or asymmetric distribution of EBEs with their mean values ("ETABAR") significantly different from zero, even for a correctly specified model; EBE-EBE correlations and covariate relationships may be masked, falsely induced, or the shape of the true relationship distorted. Consequences of epsilon-shrinkage included low power of IPRED and IWRES to diagnose structural and residual error model misspecification, respectively. EBE-based diagnostics should be interpreted with caution whenever substantial eta- or epsilon-shrinkage exists (usually greater than 20% to 30%). Reporting the magnitude of eta- and epsilon-shrinkage will facilitate the informed use and interpretation of EBE-based diagnostics.
Improving polygenic risk prediction from summary statistics by an empirical Bayes approach
So, Hon-Cheong; Sham, Pak C.
2017-01-01
Polygenic risk scores (PRS) from genome-wide association studies (GWAS) are increasingly used to predict disease risks. However some included variants could be false positives and the raw estimates of effect sizes from them may be subject to selection bias. In addition, the standard PRS approach requires testing over a range of p-value thresholds, which are often chosen arbitrarily. The prediction error estimated from the optimized threshold may also be subject to an optimistic bias. To improve genomic risk prediction, we proposed new empirical Bayes approaches to recover the underlying effect sizes and used them as weights to construct PRS. We applied the new PRS to twelve cardio-metabolic traits in the Northern Finland Birth Cohort and demonstrated improvements in predictive power (in R2) when compared to standard PRS at the best p-value threshold. Importantly, for eleven out of the twelve traits studied, the predictive performance from the entire set of genome-wide markers outperformed the best R2 from standard PRS at optimal p-value thresholds. Our proposed methodology essentially enables an automatic PRS weighting scheme without the need of choosing tuning parameters. The new method also performed satisfactorily in simulations. It is computationally simple and does not require assumptions on the effect size distributions. PMID:28145530
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
Tolkova, Elena; Power, William
2011-06-01
To a tsunami wave, bays and harbors represent oscillatory systems, whose resonance (normal) modes determine the response to tsunami and consequently the potential hazard. The direct way to obtain the resonance modes of a water reservoir is by solving the boundary problem for the eigenfunctions of the linearized shallow-water wave equation. The principal difficulty of posing such a problem for a basin coupled to an ocean is specifying the boundary between the two. The technique developed in this work allows the normal modes of a semi-enclosed water body to be obtained without a-priori restricting the resonator area. The technique utilizes complex Empirical Orthogonal Function analysis of modeled tsunami wave fields. On the examples of Poverty Bay in New Zealand and Monterey Bay in California (United States), we demonstrate that the normal modes can be identified and isolated using the EOFs of a data set comprised of the concatenated time-series collected from different tsunami scenarios in a basin. The analysis of the modeled tsunami wave fields for the normal modes can also answer the question of how likely and under which conditions the different modes are exited, due to feasible natural events.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Anestis Antoniadis
2001-06-01
Full Text Available Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
Anderson, Clare
2011-01-01
This article explores the lives of two Andamanese women, both of whom the British called “Tospy.” The first part of the article takes an indigenous and gendered perspective on early British colonization of the Andamans in the 1860s, and through the experiences of a woman called Topsy stresses the sexual violence that underpinned colonial settlement as well as the British reliance on women as cultural interlocutors. Second, the article discusses colonial naming practices, and the employment of Andamanese women and men as nursemaids and household servants during the 1890s–1910s. Using an extraordinary murder case in which a woman known as Topsy-ayah was a central witness, it argues that both reveal something of the enduring associations and legacies of slavery, as well as the cultural influence of the Atlantic in the Bay of Bengal. In sum, these women's lives present a kaleidoscope view of colonization, gender, networks of Empire, labor, and domesticity in the Bay of Bengal.
Zwick, Rebecca
1993-01-01
A validity study with 5,219 students examined the degree to which Graduate Management Admission Test (GMAT) scores and undergraduate grade point average (GPA) could predict first-year average and final GPA in doctoral programs in business. The usefulness of the predictions derived from the empirical Bayes regression models is discussed. (SLD)
Zwick, Rebecca
1993-01-01
A validity study with 5,219 students examined the degree to which Graduate Management Admission Test (GMAT) scores and undergraduate grade point average (GPA) could predict first-year average and final GPA in doctoral programs in business. The usefulness of the predictions derived from the empirical Bayes regression models is discussed. (SLD)
Casellas, J; Varona, L; Muñoz, G; Ramírez, O; Barragán, C; Tomás, A; Martínez-Giner, M; Ovilo, C; Sánchez, A; Noguera, J L; Rodríguez, M C
2008-02-01
The aim of this study was to investigate chromosomal regions affecting gestation length in sows. An experimental F2 cross between Iberian and Meishan pig breeds was used for this purpose and we genotyped 119 markers covering the 18 porcine autosomal chromosomes. Within this context, we have developed a new empirical Bayes factor (BF) approach to compare between nested models, with and without the quantitative trait loci (QTL) effect, and after including the location of the QTL as an unknown parameter in the model. This empirical BF can be easily calculated from the output of a Markov chain Monte Carlo sampling by averaging conditional densities at the null QTL effects. Linkage analyses were performed in each chromosome using an animal model to account for infinitesimal genetic effects. Initially, three QTL were detected at chromosomes 6, 8 and 11 although, after correcting for multiple testing, only the additive QTL located in cM 110 of chromosome 8 remained. For this QTL, the allelic effect of substitution of the Iberian allele increased gestation length in 0.521 days, with a highest posterior density region at 95% ranged between 0.121 and 0.972 days. Although future studies are necessary to confirm if detected QTL is relevant and segregating in commercial pig populations, a hot-spot on the genetic regulation of gestation length in pigs seems to be located in chromosome 8.
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
Nonparametric Bayes inference for concave distribution functions
Hansen, Martin Bøgsted; Lauritzen, Steffen Lilholt
2002-01-01
Bayesian inference for concave distribution functions is investigated. This is made by transforming a mixture of Dirichlet processes on the space of distribution functions to the space of concave distribution functions. We give a method for sampling from the posterior distribution using a Pólya urn...
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times.
Xu, Yanxun; Müller, Peter; Wahed, Abdus S; Thall, Peter F
2016-01-01
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out. To evaluate the regimes, mean overall survival time is expressed as a weighted average of the means of all possible sums of successive transitions times. We assume a Bayesian nonparametric survival regression model for each transition time, with a dependent Dirichlet process prior and Gaussian process base measure (DDP-GP). Posterior simulation is implemented by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. The proposed approach is compared with inverse probability of treatment weighting, including a doubly robust augmented version of this approach, for both single-stage and multi-stage regimes with treatment assignment depending on baseline covariates. The simulations show that the proposed nonparametric Bayesian approach can substantially improve inference compared to existing methods. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/.
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Zhao Yuming
2011-05-01
Full Text Available Abstract Background Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood. Results We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct ( Conclusions We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
Simone Vincenzi
2014-09-01
Full Text Available The differences in demographic and life-history processes between organisms living in the same population have important consequences for ecological and evolutionary dynamics. Modern statistical and computational methods allow the investigation of individual and shared (among homogeneous groups determinants of the observed variation in growth. We use an Empirical Bayes approach to estimate individual and shared variation in somatic growth using a von Bertalanffy growth model with random effects. To illustrate the power and generality of the method, we consider two populations of marble trout Salmo marmoratus living in Slovenian streams, where individually tagged fish have been sampled for more than 15 years. We use year-of-birth cohort, population density during the first year of life, and individual random effects as potential predictors of the von Bertalanffy growth function's parameters k (rate of growth and L∞ (asymptotic size. Our results showed that size ranks were largely maintained throughout marble trout lifetime in both populations. According to the Akaike Information Criterion (AIC, the best models showed different growth patterns for year-of-birth cohorts as well as the existence of substantial individual variation in growth trajectories after accounting for the cohort effect. For both populations, models including density during the first year of life showed that growth tended to decrease with increasing population density early in life. Model validation showed that predictions of individual growth trajectories using the random-effects model were more accurate than predictions based on mean size-at-age of fish.
Vincenzi, Simone; Mangel, Marc; Crivelli, Alain J.; Munch, Stephan; Skaug, Hans J.
2014-01-01
The differences in demographic and life-history processes between organisms living in the same population have important consequences for ecological and evolutionary dynamics. Modern statistical and computational methods allow the investigation of individual and shared (among homogeneous groups) determinants of the observed variation in growth. We use an Empirical Bayes approach to estimate individual and shared variation in somatic growth using a von Bertalanffy growth model with random effects. To illustrate the power and generality of the method, we consider two populations of marble trout Salmo marmoratus living in Slovenian streams, where individually tagged fish have been sampled for more than 15 years. We use year-of-birth cohort, population density during the first year of life, and individual random effects as potential predictors of the von Bertalanffy growth function's parameters k (rate of growth) and (asymptotic size). Our results showed that size ranks were largely maintained throughout marble trout lifetime in both populations. According to the Akaike Information Criterion (AIC), the best models showed different growth patterns for year-of-birth cohorts as well as the existence of substantial individual variation in growth trajectories after accounting for the cohort effect. For both populations, models including density during the first year of life showed that growth tended to decrease with increasing population density early in life. Model validation showed that predictions of individual growth trajectories using the random-effects model were more accurate than predictions based on mean size-at-age of fish. PMID:25211603
Contaminant source reconstruction by empirical Bayes and Akaike's Bayesian Information Criterion.
Zanini, Andrea; Woodbury, Allan D
2016-01-01
The objective of the paper is to present an empirical Bayesian method combined with Akaike's Bayesian Information Criterion (ABIC) to estimate the contaminant release history of a source in groundwater starting from few concentration measurements in space and/or in time. From the Bayesian point of view, the ABIC considers prior information on the unknown function, such as the prior distribution (assumed Gaussian) and the covariance function. The unknown statistical quantities, such as the noise variance and the covariance function parameters, are computed through the process; moreover the method quantifies also the estimation error through the confidence intervals. The methodology was successfully tested on three test cases: the classic Skaggs and Kabala release function, three sharp releases (both cases regard the transport in a one-dimensional homogenous medium) and data collected from laboratory equipment that consists of a two-dimensional homogeneous unconfined aquifer. The performances of the method were tested with two different covariance functions (Gaussian and exponential) and also with large measurement error. The obtained results were discussed and compared to the geostatistical approach of Kitanidis (1995).
Contaminant source reconstruction by empirical Bayes and Akaike's Bayesian Information Criterion
Zanini, Andrea; Woodbury, Allan D.
2016-02-01
The objective of the paper is to present an empirical Bayesian method combined with Akaike's Bayesian Information Criterion (ABIC) to estimate the contaminant release history of a source in groundwater starting from few concentration measurements in space and/or in time. From the Bayesian point of view, the ABIC considers prior information on the unknown function, such as the prior distribution (assumed Gaussian) and the covariance function. The unknown statistical quantities, such as the noise variance and the covariance function parameters, are computed through the process; moreover the method quantifies also the estimation error through the confidence intervals. The methodology was successfully tested on three test cases: the classic Skaggs and Kabala release function, three sharp releases (both cases regard the transport in a one-dimensional homogenous medium) and data collected from laboratory equipment that consists of a two-dimensional homogeneous unconfined aquifer. The performances of the method were tested with two different covariance functions (Gaussian and exponential) and also with large measurement error. The obtained results were discussed and compared to the geostatistical approach of Kitanidis (1995).
Zanini, A.; Woodbury, A. D.
2015-12-01
Contaminant release history identification has received considerable attention in the literature over the past several decades. In our review of this subject suggests that improvements are needed in terms of a reliable procedure, one that is easy to implement, with only few hyperparameters to estimate, and is able to evaluate confidence intervals. The purpose of this work is to propose an empirical Bayesian approach combined with the Akaike's Bayesian Information Criterion (ABIC) to estimate the contaminant release history starting from concentration observations in time or space. From the Bayesian point of view, the ABIC considers prior information on the unknown function, such as the prior distribution (assumed Gaussian) and the covariance function. The unknown statistical quantities, such as the noise variance and the covariance function parameters, are computed through the process; moreover the method also quantifies the estimation error through confidence intervals. We successfully test the method out on three test cases: the classic Skaggs and Kabala (1994) source, a "midnight dump" example that consists of three delta-like sources and lastly a laboratory dataset, consisting of two measurement points spatially but with synoptic observations. This experiment reproduces the response of a 2-D unconfined aquifer. The performance of the inverse method was tested with two different covariance functions (Gaussian and exponential) and also with large measurement error. Results show an excellent recovery of all sources used in the examples. Lastly, the obtained results were discussed and compared to the geostatistical approach of Kitanidis (1995).
Coverage Accuracy of Confidence Intervals in Nonparametric Regression
Song-xi Chen; Yong-song Qin
2003-01-01
Point-wise confidence intervals for a nonparametric regression function with random design points are considered. The confidence intervals are those based on the traditional normal approximation and the empirical likelihood. Their coverage accuracy is assessed by developing the Edgeworth expansions for the coverage probabilities. It is shown that the empirical likelihood confidence intervals are Bartlett correctable.
Lemey, Philippe; Suchard, Marc A.
2012-01-01
Motivation: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates () at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates. Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples. Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beast-mcmc/. The method will be made available in the next public release of the package, including support to set up analyses in BEAUti. Contact: philippe.lemey@rega.kuleuven.be or msuchard@ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23064000
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
随机删失下经验贝叶斯估计的渐近最优性%Asymptotical Optimality of Empirical Bayes Estimation Under Random Censorship
王立春
2006-01-01
该文运用经验贝叶斯(empirical Bayes(简称EB))方法,在历史样本和当前样本均被另一个具有未知分布的变量随机右删失的条件下,构造了一个指数分布参数的经验贝叶斯估计并获得了它的渐近最优性.文章最后给出了一个例子和模拟结果.
Johno, Hisashi; Nakamoto, Kazunori; Saigo, Tatsuhiko
2015-01-01
Kernel Bayes' rule has been proposed as a nonparametric kernel-based method to realize Bayesian inference in reproducing kernel Hilbert spaces. However, we demonstrate both theoretically and experimentally that the prediction result by kernel Bayes' rule is in some cases unnatural. We consider that this phenomenon is in part due to the fact that the assumptions in kernel Bayes' rule do not hold in general.
2005-01-01
Full Text Available Empirical relationships to estimate vertical attenuation coefficient of photosynthetically available radiation (KPAR using Secchi disk, vertical black disk, and horizontal sighting ranges for San Quintín Bay, Baja California, were developed. Radiometric PAR profiles were used to calculate KPAR. Vertical (ZD and horizontal (HS sighting ranges were measured with white (Secchi depth or ZSD, HSW and black (ZBD, HSB targets. The empirical power models KPAR = 1.48 ZSD –1.16, KPAR = 0.87 ZBD –1.52, KPAR = 0.54 HSW –0.65 and KPAR = 0.53 HSB –0.92 were developed for the corresponding relationships. The parameters of these models are not significantly different from those of models developed for Punta Banda Estuary, another Baja California lagoon, with the exception of the one for the KPAR-HSW relationship. Also, parameters of the KPAR-ZSD model for San Quintín Bay and Punta Banda Estuary are not significantly different from those developed for coastal waters near Santa Barbara, California. A set of general models is proposed that may apply to coastal water bodies of northwestern Baja California and southern California (KPAR = 1.45 ZSD –1.10, KPAR = 0.92 ZBD –1.45, and KPAR = 0.70 HSB –1.10. While this approach may be universal, more data are needed to explore the variability of the parameters between different water bodies.
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Comparing parametric and nonparametric regression methods for panel data
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Comparing parametric and nonparametric regression methods for panel data
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
饶贤清
2012-01-01
讨论了Pareto分布在平方损失下参数的Bayes估计,采用同分布负相协样本的核估计方法讨论了参数的经验Bayes（EB）估计问题,并计算了给定条件下参数的经验Bayes估计的收敛速度。最后,对我国高收入阶层的财富分布情况进行了实证分析,实证分析表明我国高收入阶层的财富分布是可以用Pareto分布来描述的。%In this paper, the Bayes estimation of the Pareto distribution has been derived and the empirical Bayes （EB） estimator is constructed by using the identically distributed and negatively associated （NA） samples. Further, the convergence rate of the EB estimator is shown under suitable conditions. At last, we test that the wealth distribution of high income earners in China can be described with Pareto distribution.
Robert Kavet
Full Text Available The Trans Bay Cable (TBC is a ±200-kilovolt (kV, 400 MW 85-km long High Voltage Direct Current (DC buried transmission line linking Pittsburg, CA with San Francisco, CA (SF beneath the San Francisco Estuary. The TBC runs parallel to the migratory route of various marine species, including green sturgeon, Chinook salmon, and steelhead trout. In July and August 2014, an extensive series of magnetic field measurements were taken using a pair of submerged Geometrics magnetometers towed behind a survey vessel in four locations in the San Francisco estuary along profiles that cross the cable's path; these included the San Francisco-Oakland Bay Bridge (BB, the Richmond-San Rafael Bridge (RSR, the Benicia-Martinez Bridge (Ben and an area in San Pablo Bay (SP in which a bridge is not present. In this paper, we apply basic formulas that ideally describe the magnetic field from a DC cable summed vectorially with the background geomagnetic field (in the absence of other sources that would perturb the ambient field to derive characteristics of the cable that are otherwise not immediately observable. Magnetic field profiles from measurements taken along 170 survey lines were inspected visually for evidence of a distinct pattern representing the presence of the cable. Many profiles were dominated by field distortions unrelated to the cable caused by bridge structures or other submerged objects, and the cable's contribution to the field was not detectable. BB, with 40 of the survey lines, did not yield usable data for these reasons. The unrelated anomalies could be up to 100 times greater than those from the cable. In total, discernible magnetic field profiles measured from 76 survey lines were regressed against the equations, representing eight days of measurement. The modeled field anomalies due to the cable (the difference between the maximum and minimum field along the survey line at the cable crossing were virtually identical to the measured values. The
Kavet, Robert; Wyman, Megan T; Klimley, A Peter
2016-01-01
The Trans Bay Cable (TBC) is a ±200-kilovolt (kV), 400 MW 85-km long High Voltage Direct Current (DC) buried transmission line linking Pittsburg, CA with San Francisco, CA (SF) beneath the San Francisco Estuary. The TBC runs parallel to the migratory route of various marine species, including green sturgeon, Chinook salmon, and steelhead trout. In July and August 2014, an extensive series of magnetic field measurements were taken using a pair of submerged Geometrics magnetometers towed behind a survey vessel in four locations in the San Francisco estuary along profiles that cross the cable's path; these included the San Francisco-Oakland Bay Bridge (BB), the Richmond-San Rafael Bridge (RSR), the Benicia-Martinez Bridge (Ben) and an area in San Pablo Bay (SP) in which a bridge is not present. In this paper, we apply basic formulas that ideally describe the magnetic field from a DC cable summed vectorially with the background geomagnetic field (in the absence of other sources that would perturb the ambient field) to derive characteristics of the cable that are otherwise not immediately observable. Magnetic field profiles from measurements taken along 170 survey lines were inspected visually for evidence of a distinct pattern representing the presence of the cable. Many profiles were dominated by field distortions unrelated to the cable caused by bridge structures or other submerged objects, and the cable's contribution to the field was not detectable. BB, with 40 of the survey lines, did not yield usable data for these reasons. The unrelated anomalies could be up to 100 times greater than those from the cable. In total, discernible magnetic field profiles measured from 76 survey lines were regressed against the equations, representing eight days of measurement. The modeled field anomalies due to the cable (the difference between the maximum and minimum field along the survey line at the cable crossing) were virtually identical to the measured values. The modeling
Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical B...
Jens Clausen
2010-06-01
Full Text Available This paper discusses the sustainability impact (contribution to sustainability, reduction of adverse environmental impacts of online second-hand trading. A survey of eBay users shows that a relationship between the trading of used goods and the protection of natural resources is hardly realized. Secondly, the environmental motivation and the willingness to act in a sustainable manner differ widely between groups of consumers. Given these results from a user perspective, the paper tries to find some objective hints of online second-hand trading’s environmental impact. The greenhouse gas emissions resulting from the energy used for the trading transactions seem to be considerably lower than the emissions due to the (avoided production of new goods. The paper concludes with a set of recommendations for second-hand trade and consumer policy. Information about the sustainability benefits of purchasing second-hand goods should be included in general consumer information, and arguments for changes in behavior should be targeted to different groups of consumers.
赵文芝; 夏志明; 贺飞跃
2016-01-01
The two-step estimators for change point in nonparametric regression are proposed.In the first step,an initial estimator is obtained by local linear smoothing method.In the second step,the fi-nal estimator is obtained by CUSUM method on a closed neighborhood of initial estimator.It is found through a simulation study that the proposed estimator is efficient.The estimator for j ump size is also obtained.Further more,experimental results that using historical data on Nile river discharges,ex-change rate data of USD against RMB and global temperature data for the northern hemisphere show that the proposed method is also practical in applications.%针对非参数回归模型变点问题，给出了变点的两步估计方法。第一步，用局部线性方法给出变点的初始估计量；第二步，在初始估计量的邻域内，用 CUSUM方法给出变点的最终估计量，同时获得了变点跃度的估计量。蒙特卡罗随机模拟结果表明了此方法的有效性。最后以尼罗河流量数据，美元兑换人民币汇率数据以及北半球月平均气温数据为例进行分析，结果说明此方法有实际应用价值。
Nonparametric estimation of ultrasound pulses
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Poverty and life cycle effects: A nonparametric analysis for Germany
Stich, Andreas
1996-01-01
Most empirical studies on poverty consider the extent of poverty either for the entire society or for separate groups like elderly people.However, these papers do not show what the situation looks like for persons of a certain age. In this paper poverty measures depending on age are derived using the joint density of income and age. The density is nonparametrically estimated by weighted Gaussian kernel density estimation. Applying the conditional density of income to several poverty measures ...
Nonparametric test for detecting change in distribution with panel data
Pommeret, Denys; Ghattas, Badih
2011-01-01
This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.
Conover, W.J. [Texas Tech Univ., Lubbock, TX (United States); Cox, D.D. [Rice Univ., Houston, TX (United States); Martz, H.F. [Los Alamos National Lab., NM (United States)
1997-12-01
When using parametric empirical Bayes estimation methods for estimating the binomial or Poisson parameter, the validity of the assumed beta or gamma conjugate prior distribution is an important diagnostic consideration. Chi-square goodness-of-fit tests of the beta or gamma prior hypothesis are developed for use when the binomial sample sizes or Poisson exposure times vary. Nine examples illustrate the application of the methods, using real data from such diverse applications as the loss of feedwater flow rates in nuclear power plants, the probability of failure to run on demand and the failure rates of the high pressure coolant injection systems at US commercial boiling water reactors, the probability of failure to run on demand of emergency diesel generators in US commercial nuclear power plants, the rate of failure of aircraft air conditioners, baseball batting averages, the probability of testing positive for toxoplasmosis, and the probability of tumors in rats. The tests are easily applied in practice by means of corresponding Mathematica{reg_sign} computer programs which are provided.
Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Measuring the influence of networks on transaction costs using a non-parametric regression technique
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
Xu, Xu Steven; Yuan, Min; Yang, Haitao; Feng, Yan; Xu, Jinfeng; Pinheiro, Jose
2017-01-01
Covariate analysis based on population pharmacokinetics (PPK) is used to identify clinically relevant factors. The likelihood ratio test (LRT) based on nonlinear mixed effect model fits is currently recommended for covariate identification, whereas individual empirical Bayesian estimates (EBEs) are considered unreliable due to the presence of shrinkage. The objectives of this research were to investigate the type I error for LRT and EBE approaches, to confirm the similarity of power between the LRT and EBE approaches from a previous report and to explore the influence of shrinkage on LRT and EBE inferences. Using an oral one-compartment PK model with a single covariate impacting on clearance, we conducted a wide range of simulations according to a two-way factorial design. The results revealed that the EBE-based regression not only provided almost identical power for detecting a covariate effect, but also controlled the false positive rate better than the LRT approach. Shrinkage of EBEs is likely not the root cause for decrease in power or inflated false positive rate although the size of the covariate effect tends to be underestimated at high shrinkage. In summary, contrary to the current recommendations, EBEs may be a better choice for statistical tests in PPK covariate analysis compared to LRT. We proposed a three-step covariate modeling approach for population PK analysis to utilize the advantages of EBEs while overcoming their shortcomings, which allows not only markedly reducing the run time for population PK analysis, but also providing more accurate covariate tests.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Nonparametric Econometrics: The np Package
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Nonparametric Regression with Common Shocks
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
Nonparametric Bayesian Modeling of Complex Networks
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
An asymptotically optimal nonparametric adaptive controller
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
Testing for a constant coefficient of variation in nonparametric regression
Dette, Holger; Marchlewski, Mareen; Wagener, Jens
2010-01-01
In the common nonparametric regression model Y_i=m(X_i)+sigma(X_i)epsilon_i we consider the problem of testing the hypothesis that the coefficient of the scale and location function is constant. The test is based on a comparison of the observations Y_i=\\hat{sigma}(X_i) with their mean by a smoothed empirical process, where \\hat{sigma} denotes the local linear estimate of the scale function. We show weak convergence of a centered version of this process to a Gaussian process under the null ...
Froeschke, John T.; Stunz, Gregory W.; Sterba-Boatwright, Blair; Wildhaber, Mark L.
2010-01-01
Using a long-term fisheries-independent data set, we tested the 'shark nursery area concept' proposed by Heupel et al. (2007) with the suggested working assumptions that a shark nursery habitat would: (1) have an abundance of immature sharks greater than the mean abundance across all habitats where they occur; (2) be used by sharks repeatedly through time (years); and (3) see immature sharks remaining within the habitat for extended periods of time. We tested this concept using young-of-the-year (age 0) and juvenile (age 1+ yr) bull sharks Carcharhinus leucas from gill-net surveys conducted in Texas bays from 1976 to 2006 to estimate the potential nursery function of 9 coastal bays. Of the 9 bay systems considered as potential nursery habitat, only Matagorda Bay satisfied all 3 criteria for young-of-the-year bull sharks. Both Matagorda and San Antonio Bays met the criteria for juvenile bull sharks. Through these analyses we examined the utility of this approach for characterizing nursery areas and we also describe some practical considerations, such as the influence of the temporal or spatial scales considered when applying the nursery role concept to shark populations.
Nonparametric estimation of the stationary M/G/1 workload distribution function
Hansen, Martin Bøgsted
2005-01-01
In this paper it is demonstrated how a nonparametric estimator of the stationary workload distribution function of the M/G/1-queue can be obtained by systematic sampling the workload process. Weak convergence results and bootstrap methods for empirical distribution functions for stationary associ...
Non-parametric production analysis of pesticides use in the Netherlands
Oude Lansink, A.G.J.M.; Silva, E.
2004-01-01
Many previous empirical studies on the productivity of pesticides suggest that pesticides are under-utilized in agriculture despite the general held believe that these inputs are substantially over-utilized. This paper uses data envelopment analysis (DEA) to calculate non-parametric measures of the
姜岩松; 刘雨; 苏宝库
2011-01-01
In order to eliminate the multicollinearity caused by a g2 model of dual orthogonal accelerometers and increase the accuracy of coefficient identification in the high precision accelerometer calibration test with a gravity field, this paper proposed empirical Bayes ridge estimation, which manipulated the model of dual orthogonal accelerometers. The simulation and experiment show that compared with conventional least squares estimation, empirical Bayes ridge estimation can not only eliminate the influence of the multicollinearity and separate K2, but also has higher identification accuracy.%针对高精度加速度计在重力场的标定试验,为了消除正交双表g2观测模型引起系统复共线性的影响和提高加速度计模型系数的辨识精度,文中提出了一种经验贝叶斯岭估计辨识方法并应用于正交双表误差模型的参数辨识中.从仿真分析和实验结果中看出,与传统的最小二乘法相比,经验贝叶斯岭估计能够消除系统的复共线性,可以分离出二次项系数K2,且辨识精度较高.
Otsubo, Koichi; Yamaoka, Kazue; Yokoyama, Tetsuji; Takahashi, Kunihiko; Nishikawa, Masako; Tango, Toshiro
2009-02-01
The standardized mortality ratio (SMR) is frequently used to compare health status among different populations. However, SMR could be biased when based upon communities with small population size such as towns and wards and comparison of SMRs in such cases is not appropriate. The "empirical Bayes estimate of standardized mortality ratio" (EBSMR) is a useful alternative index for comparing mortalities among small populations. The objective of the present study was to use the EBSMR to clarify the relationships between health care resources and mortalities in 3,360 municipalities in Japan. Health care resource data (number of physicians, number of general clinics, number of general sickbeds, and number of emergency hospitals) and socioeconomic factors (population, birth rate, aged households, marital rate, divorce rate, taxable income per individual under taxes duty, unemployment, secondary, tertiary industrial employment and prefecture) were obtained from officially published reports. EBSMRs for all causes, cerebrovascular disease, heart disease, acute myocardial infarction, and malignant neoplasms were calculated from the 1997-2001 vital statistic records. Multiple regression analysis was used to examine the relationships between EBSMRs and the variables representing health care resources and socioeconomic factors as covariates. Some of the variables were log-transformed to normalize the distribution of variables. The correlation between number of physicians and general sickbeds was very high (Pearson's r = 0.776). So, we excluded the number of general sickbeds. Some of the EBSMRs were inversely associated with the number of physicians per person (all causes in males (beta = -0.042, P = 0.024) and females (beta = -0.150, P < 0.001), cerebrovascular disease in females (beta = -0.074, P < 0.001), heart disease in males (beta = -0.066, P < 0.001) and females (beta = - 0.087, P < 0.001), acute myocardial infarction in females (beta = -0.061, P = 0.003), and malignant
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Parametric and Non-Parametric System Modelling
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
Bayesian nonparametric duration model with censorship
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
张玲霞; 师义民
2000-01-01
Consider the two-sided truncation distribution families written in the form f(x｜θ)= ω ( θ1, θ2) h (x) I[θ1,θ2] (x), where θ= ( θ1, θ2 ) T (x) = (t1 (x), t2 (x)) = ( min (x1,…, xm ), max (x1,… ,xm)) is a sufficient statistic, its marginal density is denoted by f(t). In this paper, by estimating f(t), we construct the empirical Bayes estimation (EBE) for parameter-function Q(θ), and prove the EBE is an asymptotically optimal that of Q(θ).%本文考虑一维双边截断型分布族参数函数在平方损失下的经验 Bayes 估计问题．给定θ，x的条件分布为 f(x｜θ)=ω(θ1，θ2)h(x)I[θ1,θ2](x)dx其中 θ=(θ1，θ2) T(x)=(t1(x)，t2(x))=(min(x1，…，xm)，max(x1，…，xm))是充分统计量．其边缘密度为f(t)，本文通过f(t)的核估计构造出θ的函数的经验 Bayes 估计，并证明在一定的条件下是渐近最优的(a．0．)．
随机删失下刻度参数的单调经验贝叶斯检验%Monotone Empirical Bayes Test for Scale Parameter under Random Censorship
王立春
2007-01-01
We study the two-action problem in the scale-exponential family via the empirical Bayes (EB) approach and present a monotone EB test possessing a rate of convergence which can be arbitrarily close to O(n-1) under the condition that the past samples are randomly censored from the right.%利用经验贝叶斯方法研究了刻度指数族的两行动问题,提出了一个在历史样本被随机右删失的条件下收敛速度可以任意接近O(n-1)的单调经验贝叶斯检验.
a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows
Molina, J. M.; Ramirez, J. A.; Raff, D. A.
2011-12-01
A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Estimation of Response Functions Based on Variational Bayes Algorithm in Dynamic Images Sequences
Bowei Shan
2016-01-01
Full Text Available We proposed a nonparametric Bayesian model based on variational Bayes algorithm to estimate the response functions in dynamic medical imaging. In dynamic renal scintigraphy, the impulse response or retention functions are rather complicated and finding a suitable parametric form is problematic. In this paper, we estimated the response functions using nonparametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions were used within hierarchical priors of the variational Bayes algorithm. We performed our algorithm on the real online dataset of dynamic renal scintigraphy. The results demonstrated that this algorithm improved the estimation of response functions with nonparametric priors.
Non-parametric Morphologies of Mergers in the Illustris Simulation
Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G
2016-01-01
We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
Lee, Kyungbook; Song, Seok Goo
2016-10-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events (M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Nonparametric correlation models for portfolio allocation
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Empirical likelihood method in survival analysis
Zhou, Mai
2015-01-01
Add the Empirical Likelihood to Your Nonparametric ToolboxEmpirical Likelihood Method in Survival Analysis explains how to use the empirical likelihood method for right censored survival data. The author uses R for calculating empirical likelihood and includes many worked out examples with the associated R code. The datasets and code are available for download on his website and CRAN.The book focuses on all the standard survival analysis topics treated with empirical likelihood, including hazard functions, cumulative distribution functions, analysis of the Cox model, and computation of empiric
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
A Censored Nonparametric Software Reliability Model
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Nonparametric correlation models for portfolio allocation
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Decompounding random sums: A nonparametric approach
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
Panel data specifications in nonparametric kernel regression
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Asymmetry Effects in Chinese Stock Markets Volatility: A Generalized Additive Nonparametric Approach
Hou, Ai Jun
2007-01-01
The unique characteristics of the Chinese stock markets make it difficult to assume a particular distribution for innovations in returns and the specification form of the volatility process when modeling return volatility with the parametric GARCH family models. This paper therefore applies a generalized additive nonparametric smoothing technique to examine the volatility of the Chinese stock markets. The empirical results indicate that an asymmetric effect of negative news exists in the Chin...
Study on headland-bay sandy coast stability in South China coasts
Yu, Ji-Tao; Chen, Zi-Shen
2011-03-01
Headland-bay beach equilibrium planform has been a crucial problem abroad to long-term sandy beach evolution and stabilization, extensively applied to forecast long-term coastal erosion evolvement and the influences of coastal engineering as well as long-term coastal management and protection. However, little concern focuses on this in China. The parabolic relationship is the most widely used empirical relationship for determining the static equilibrium shape of headland-bay beaches. This paper utilizes the relation to predict and classify 31 headland-bay beaches and concludes that these bays cannot achieve the ultimate static equilibrium planform in South China. The empirical bay equation can morphologically estimate beach stabilization state, but it is just a referential predictable means and is difficult to evaluate headland-bay shoreline movements in years and decades. By using Digital Shoreline Analysis System suggested by USGS, the rates of shoreline recession and accretion of these different headland-bay beaches are quantitatively calculated from 1990 to 2000. The conclusions of this paper include that (a) most of these 31 bays maintain relatively stable and the rates of erosion and accretion are relatively large with the impact of man-made constructions on estuarine within these bays from 1990 to 2000; (b) two bays, Haimen Bay and Hailingshan Bay, originally in the quasi-static equilibrium planform determined by the parabolic bay shape equation, have been unstable by the influence of coastal engineering; and (c) these 31 bays have different recession and accretion characters occurring in some bays and some segments. On the one hand, some bays totally exhibit accretion, but some bays show erosion on the whole. Shanwei Bay, Houmen Bay, Pinghai Bay and Yazhou Bay have the similar planforms, characterized by less accretion on the sheltering segment and bigger accretion on the transitional and tangential segments. On the other hand, different segments of some
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
The Utility of Nonparametric Transformations for Imputation of Survey Data
Robbins Michael W.
2014-12-01
Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.
portfolio optimization based on nonparametric estimation methods
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
Non-Parametric Estimation of Correlation Functions
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Preliminary results on nonparametric facial occlusion detection
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Pivotal Estimation of Nonparametric Functions via Square-root Lasso
Belloni, Alexandre; Wang, Lie
2011-01-01
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter $\\sigma$ of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, $\\ell_1$-rate of converge, $\\ell_\\infty$-rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild con...
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
Nonparametric estimation of employee stock options
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
On Parametric (and Non-Parametric Variation
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
A nonparametric dynamic additive regression model for longitudinal data
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Nonparametric bootstrap analysis with applications to demographic effects in demand functions.
Gozalo, P L
1997-12-01
"A new bootstrap proposal, labeled smooth conditional moment (SCM) bootstrap, is introduced for independent but not necessarily identically distributed data, where the classical bootstrap procedure fails.... A good example of the benefits of using nonparametric and bootstrap methods is the area of empirical demand analysis. In particular, we will be concerned with their application to the study of two important topics: what are the most relevant effects of household demographic variables on demand behavior, and to what extent present parametric specifications capture these effects." excerpt
Kerschbamer, Rudolf
2015-05-01
This paper proposes a geometric delineation of distributional preference types and a non-parametric approach for their identification in a two-person context. It starts with a small set of assumptions on preferences and shows that this set (i) naturally results in a taxonomy of distributional archetypes that nests all empirically relevant types considered in previous work; and (ii) gives rise to a clean experimental identification procedure - the Equality Equivalence Test - that discriminates between archetypes according to core features of preferences rather than properties of specific modeling variants. As a by-product the test yields a two-dimensional index of preference intensity.
The properties and mechanism of long-term memory in nonparametric volatility
Li, Handong; Cao, Shi-Nan; Wang, Yan
2010-08-01
Recent empirical literature documents the presence of long-term memory in return volatility. But the mechanism of the existence of long-term memory is still unclear. In this paper, we investigate the origin and properties of long-term memory with nonparametric volatility, using high-frequency time series data of the Chinese Shanghai Composite Stock Price Index. We perform Detrended Fluctuation Analysis (DFA) on three different nonparametric volatility estimators with different sampling frequencies. For the same volatility series, the Hurst exponents reduce as the sampling time interval increases, but they are still larger than 1/2, which means that no matter how the interval changes, it still cannot change the existence of long memory. RRV presents a relatively stable property on long-term memory and is less influenced by sampling frequency. RV and RBV have some evolutionary trends depending on time intervals, which indicating that the jump component has no significant impact on the long-term memory property. This suggests that the presence of long-term memory in nonparametric volatility can be contributed to the integrated variance component. Considering the impact of microstructure noise, RBV and RRV still present long-term memory under various time intervals. We can infer that the presence of long-term memory in realized volatility is not affected by market microstructure noise. Our findings imply that the long-term memory phenomenon is an inherent characteristic of the data generating process, not a result of microstructure noise or volatility clustering.
Asymptotic theory of nonparametric regression estimates with censored data
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Empirical Bayes methods in road safety research.
Vogelesang, R.A.W.
1997-01-01
Road safety research is a wonderful combination of counting fatal accidents and using a toolkit containing prior, posterior, overdispersed Poisson, negative binomial and Gamma distributions, together with positive and negative regression effects, shrinkage estimators and fiercy debates concerning th
Nonparametric methods in actigraphy: An update
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Nonparametric Detection of Geometric Structures Over Networks
Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent
2017-10-01
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Decision Bayes Criteria for Optimal Classifier Based on Probabilistic Measures
Wissal Drira; Faouzi Ghorbel
2014-01-01
This paper addresses the high dimension sample problem in discriminate analysis under nonparametric and supervised assumptions. Since there is a kind of equivalence between the probabilistic dependence measure and the Bayes classification error probability, we propose to use an iterative algorithm to optimize the dimension reduction for classification with a probabilistic approach to achieve the Bayes classifier. The estimated probabilities of different errors encountered along the different phases of the system are realized by the Kernel estimate which is adjusted in a means of the smoothing parameter. Experiment results suggest that the proposed approach performs well.
Nonparametric Bayesian drift estimation for multidimensional stochastic differential equations
Gugushvili, S.; Spreij, P.
2014-01-01
We consider nonparametric Bayesian estimation of the drift coefficient of a multidimensional stochastic differential equation from discrete-time observations on the solution of this equation. Under suitable regularity conditions, we establish posterior consistency in this context.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of
A non-parametric approach to investigating fish population dynamics
Cook, R.M; Fryer, R.J
2001-01-01
.... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
PV power forecast using a nonparametric PV model
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
Forecasting the AC power output of a PV plant accurately is important both for plant owners and electric system operators. Two main categories of PV modeling are available: the parametric and the nonparametric. In this paper, a methodology using a nonparametric PV model is proposed, using as inputs several forecasts of meteorological variables from a Numerical Weather Forecast model, and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quant...
Handley, Lawrence R.; Spear, Kathryn A.; Eleonor Taylor,; Thatcher, Cindy
2011-01-01
The Galveston Bay estuary is located on the upper Texas Gulf coast (Lester and Gonzalez, 2002). It is composed of four major sub-bays—Galveston, Trinity, East, and West Bays. It is Texas’ largest estuary on the Gulf Coast with a total area of 155,399 hectares (384,000 acres) and 1,885 km (1,171 miles) of shoreline (Burgan and Engle, 2006). The volume of the bay has increased over the past 50 years due to subsidence, dredging, and sea level rise. Outside of ship channels, the maximum depth is only 3.7 m (12 ft), with the average depth ranging from 1.2 m (4 ft) to 2.4 m (8 ft)— even shallower in areas with widespread oyster reefs (Lester and Gonzalez, 2002). The tidal range is less than 0.9 m (3 ft), but water levels and circulation are highly influenced by wind. The estuary was formed in a drowned river delta, and its bayous were once channels of the Brazos and Trinity Rivers. Today, the watersheds surrounding the Trinity and San Jacinto Rivers, along with many other smaller bayous, feed into the bay. The entire Galveston Bay watershed is 85,470 km2 (33,000 miles2 ) large (Figure 1). Galveston Island, a 5,000 year old sand bar that lies at the western edge of the bay’s opening into the Gulf of Mexico, impedes the freshwater flow of the Trinity and San Jacinto Rivers into the Gulf, the majority of which comes from the Trinity. The Bolivar Peninsula lies at the eastern edge of the bay’s opening into the Gulf. Water flows into the Gulf at Bolivar Roads, 1 U.S. Geological Survey National Wetlands Research Center, 700 Cajundome Blvd., Lafayette, LA 70506 2 Harte Research Institute for Gulf of Mexico Studies, Texas A&M University - Corpus Christi, 6300 Ocean Drive, Unit 5869, Corpus Christi, Texas 78412 2 Galveston Pass, between Galveston Island and Bolivar Peninsula, and at San Luis Pass, between the western side of Galveston Island and Follets Island.
Asymptotic theory of nonparametric regression estimates with censored data
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
Study on Headland-Bay Sandy Coast Stability in South China Coasts
YU Ji-tao; CHEN Zi-shen
2011-01-01
Headland-bay beach equilibrium planform has been a crucial problem abroad to long-term sandy beach evolution and stabilization,extensively applied to forecast long-term coastal erosion evolvement and the influences of coastal engineering as well as long-term coastal management and protection.However,little concern focuses on this in China.The parabolic relationship is the most widely used empirical relationship for determining the static equilibrium shape of headland-bay beaches.This paper utilizes the relation to predict and classify 31 headland-bay beaches and concludes that these bays cannot achieve the ultimate static equilibrium planform in South China.The empirical bay equation can morphologically estimate beach stabilization state,but it is just a referential predictable means and is difficult to evaluate headland-bay shoreline movements in years and decades.By using Digital Shoreline Analysis System suggested by USGS,the rates of shoreline recession and accretion of these different headland-bay beaches are quantitatively calculated from 1990 to 2000.The conclusions of this paper include that(a)most of these 31 bays maintain relatively stable and the rates of erosion and accretion are relatively large with the impact of man-made constructions on estuarine within these bays from 1990 to 2000;(b)two bays,Haimen Bay and Hailingshan Bay,originally in the quasi-static equilibrium planform determined by the parabolic bay shape equation,have been unstable by the influence of coastal engineering;and(c)these 31 bays have different recession and accretion characters occurring in some bays and some segments.On the one hand,some bays totally exhibit accretion,but some bays show erosion on the whole.Shanwei Bay,Houmen Bay,Pinghai Bay and Yazhou Bay have the similar planfotms,characterized by less accretion on the sheltering segment and bigger accretion on the transitional and tangential segments.On the other hand,different segments of some bays have two dissimilar
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Nonparametric estimation of a convex bathtub-shaped hazard function.
Jankowski, Hanna K; Wellner, Jon A
2009-11-01
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n(2/5) at points x(0) where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
Rizzi, Silvia; Thinggaard, Mikael; Engholm, Gerda
2016-01-01
Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age...... methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate...... composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized...
Nonparametric Bayes approach for a semi-mechanistic pharmacokinetic and pharmacodynamic model
Dong, Yan
Both frequentist and Bayesian approaches have been used to characterize population pharmacokinetics and pharmacodynamics(PK/PD) models. These methods focus on estimating the population parameters and assessing the association between the characteristics of PK/PD and the subject covariates. In this work, we propose a Dirichlet process mixture model to classify the patients based on their individualized pharmacokinetic and pharmacodynamic profiles. Then we can predict the new patients' dose-response curves given their concentration-time profiles. Additionally, we implement a modern Markov Chain Monte Carlo algorithm for sampling inference of parameters. The detailed sampling procedures as well as the results are discussed in a simulation data and a real data example. We also evaluate an approximate solution of a system of nonlinear differential equations from Euler's method and compare the results with a general numerical solver, ode from R package, deSolve.
Hampf, Benjamin
2011-08-15
In this paper we present a new approach to evaluate the environmental efficiency of decision making units. We propose a model that describes a two-stage process consisting of a production and an end-of-pipe abatement stage with the environmental efficiency being determined by the efficiency of both stages. Taking the dependencies between the two stages into account, we show how nonparametric methods can be used to measure environmental efficiency and to decompose it into production and abatement efficiency. For an empirical illustration we apply our model to an analysis of U.S. power plants.
Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders
Nielsen, Morten Ørregaard
2009-01-01
In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...
Non-parametric analysis of rating transition and default data
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....
A non-parametric model for the cosmic velocity field
Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S
1999-01-01
We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.
Influence of test and person characteristics on nonparametric appropriateness measurement
Meijer, Rob R.; Molenaar, Ivo W.; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement
Meijer, Rob R; Molenaar, Ivo W; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects
Qian, Minghui; Hu, Ridong; Chen, Jianwei
2016-01-01
Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Uniform Consistency for Nonparametric Estimators in Null Recurrent Time Series
Gao, Jiti; Kanaya, Shin; Li, Degui
2015-01-01
This paper establishes uniform consistency results for nonparametric kernel density and regression estimators when time series regressors concerned are nonstationary null recurrent Markov chains. Under suitable regularity conditions, we derive uniform convergence rates of the estimators. Our...... results can be viewed as a nonstationary extension of some well-known uniform consistency results for stationary time series....
Non-parametric Bayesian inference for inhomogeneous Markov point processes
Berthelsen, Kasper Klitgaard; Møller, Jesper
With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...
Investigating the cultural patterns of corruption: A nonparametric analysis
Halkos, George; Tzeremes, Nickolaos
2011-01-01
By using a sample of 77 countries our analysis applies several nonparametric techniques in order to reveal the link between national culture and corruption. Based on Hofstede’s cultural dimensions and the corruption perception index, the results reveal that countries with higher levels of corruption tend to have higher power distance and collectivism values in their society.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump
Non-parametric analysis of rating transition and default data
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
Akhtar R. Siddique
2000-03-01
Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.
Comparison of Rank Analysis of Covariance and Nonparametric Randomized Blocks Analysis.
Porter, Andrew C.; McSweeney, Maryellen
The relative power of three possible experimental designs under the condition that data is to be analyzed by nonparametric techniques; the comparison of the power of each nonparametric technique to its parametric analogue; and the comparison of relative powers using nonparametric and parametric techniques are discussed. The three nonparametric…
Owen, Art B
2001-01-01
Empirical likelihood provides inferences whose validity does not depend on specifying a parametric model for the data. Because it uses a likelihood, the method has certain inherent advantages over resampling methods: it uses the data to determine the shape of the confidence regions, and it makes it easy to combined data from multiple sources. It also facilitates incorporating side information, and it simplifies accounting for censored, truncated, or biased sampling.One of the first books published on the subject, Empirical Likelihood offers an in-depth treatment of this method for constructing confidence regions and testing hypotheses. The author applies empirical likelihood to a range of problems, from those as simple as setting a confidence region for a univariate mean under IID sampling, to problems defined through smooth functions of means, regression models, generalized linear models, estimating equations, or kernel smooths, and to sampling with non-identically distributed data. Abundant figures offer vi...
Nonparametric Single-Trial EEG Feature Extraction and Classification of Driver's Cognitive Responses
I-Fang Chung
2008-05-01
Full Text Available We proposed an electroencephalographic (EEG signal analysis approach to investigate the driver's cognitive response to traffic-light experiments in a virtual-reality-(VR- based simulated driving environment. EEG signals are digitally sampled and then transformed by three different feature extraction methods including nonparametric weighted feature extraction (NWFE, principal component analysis (PCA, and linear discriminant analysis (LDA, which were also used to reduce the feature dimension and project the measured EEG signals to a feature space spanned by their eigenvectors. After that, the mapped data could be classified with fewer features and their classification results were compared by utilizing two different classifiers including k nearest neighbor classification (KNNC and naive bayes classifier (NBC. Experimental data were collected from 6 subjects and the results show that NWFE+NBC gives the best classification accuracy ranging from 71%Ã¢ÂˆÂ¼77%, which is over 10%Ã¢ÂˆÂ¼24% higher than LDA+KNN1. It also demonstrates the feasibility of detecting and analyzing single-trial EEG signals that represent operators' cognitive states and responses to task events.
Nonparametric Single-Trial EEG Feature Extraction and Classification of Driver's Cognitive Responses
Lin, Chin-Teng; Lin, Ken-Li; Ko, Li-Wei; Liang, Sheng-Fu; Kuo, Bor-Chen; Chung, I.-Fang
2008-12-01
We proposed an electroencephalographic (EEG) signal analysis approach to investigate the driver's cognitive response to traffic-light experiments in a virtual-reality-(VR-) based simulated driving environment. EEG signals are digitally sampled and then transformed by three different feature extraction methods including nonparametric weighted feature extraction (NWFE), principal component analysis (PCA), and linear discriminant analysis (LDA), which were also used to reduce the feature dimension and project the measured EEG signals to a feature space spanned by their eigenvectors. After that, the mapped data could be classified with fewer features and their classification results were compared by utilizing two different classifiers including [InlineEquation not available: see fulltext.] nearest neighbor classification (KNNC) and naive bayes classifier (NBC). Experimental data were collected from 6 subjects and the results show that NWFE+NBC gives the best classification accuracy ranging from [InlineEquation not available: see fulltext.], which is over [InlineEquation not available: see fulltext.] higher than LDA+KNN1. It also demonstrates the feasibility of detecting and analyzing single-trial EEG signals that represent operators' cognitive states and responses to task events.
Nonparametric inference procedures for multistate life table analysis.
Dow, M M
1985-01-01
Recent generalizations of the classical single state life table procedures to the multistate case provide the means to analyze simultaneously the mobility and mortality experience of 1 or more cohorts. This paper examines fairly general nonparametric combinatorial matrix procedures, known as quadratic assignment, as an analysis technic of various transitional patterns commonly generated by cohorts over the life cycle course. To some degree, the output from a multistate life table analysis suggests inference procedures. In his discussion of multstate life table construction features, the author focuses on the matrix formulation of the problem. He then presents several examples of the proposed nonparametric procedures. Data for the mobility and life expectancies at birth matrices come from the 458 member Cayo Santiago rhesus monkey colony. The author's matrix combinatorial approach to hypotheses testing may prove to be a useful inferential strategy in several multidimensional demographic areas.
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Combined parametric-nonparametric identification of block-oriented systems
Mzyk, Grzegorz
2014-01-01
This book considers a problem of block-oriented nonlinear dynamic system identification in the presence of random disturbances. This class of systems includes various interconnections of linear dynamic blocks and static nonlinear elements, e.g., Hammerstein system, Wiener system, Wiener-Hammerstein ("sandwich") system and additive NARMAX systems with feedback. Interconnecting signals are not accessible for measurement. The combined parametric-nonparametric algorithms, proposed in the book, can be selected dependently on the prior knowledge of the system and signals. Most of them are based on the decomposition of the complex system identification task into simpler local sub-problems by using non-parametric (kernel or orthogonal) regression estimation. In the parametric stage, the generalized least squares or the instrumental variables technique is commonly applied to cope with correlated excitations. Limit properties of the algorithms have been shown analytically and illustrated in simple experiments.
Estimation of Stochastic Volatility Models by Nonparametric Filtering
Kanaya, Shin; Kristensen, Dennis
2016-01-01
/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes
Biqing Cai
2015-04-01
Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.
Using non-parametric methods in econometric production analysis
Czekaj, Tomasz Gerard; Henningsen, Arne
-Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Using non-parametric methods in econometric production analysis
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
A Bayesian nonparametric method for prediction in EST analysis
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Fusion of Hard and Soft Information in Nonparametric Density Estimation
2015-06-10
estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample...Fusion of Hard and Soft Information in Nonparametric Density Estimation∗ Johannes O. Royset Roger J-B Wets Department of Operations Research...univariate density estimation in situations when the sample ( hard information) is supplemented by “soft” information about the random phenomenon. These
Nonparametric estimation for hazard rate monotonously decreasing system
Han Fengyan; Li Weisong
2005-01-01
Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.
Non-parametric versus parametric methods in environmental sciences
Muhammad Riaz
2016-01-01
Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.
Urban Noise Modelling in Boka Kotorska Bay
Aleksandar Nikolić
2014-04-01
Full Text Available Traffic is the most significant noise source in urban areas. The village of Kamenari in Boka Kotorska Bay is a site where, in a relatively small area, road traffic and sea (ferry traffic take place at the same time. Due to the specificity of the location, i.e. very rare synergy of sound effects of road and sea traffic in the urban area, as well as the expressed need for assessment of noise level in a simple and quick way, a research was conducted, using empirical methods and statistical analysis methods, which led to the creation of acoustic model for the assessment of equivalent noise level (Leq. The developed model for noise assessment in the Village of Kamenari in Boka Kotorska Bay quite realistically provides data on possible noise levels at the observed site, with very little deviations in relation to empirically obtained values.
Panel data nonparametric estimation of production risk and risk preferences
Czekaj, Tomasz Gerard; Henningsen, Arne
We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...... approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price...
Digital spectral analysis parametric, non-parametric and advanced methods
Castanié, Francis
2013-01-01
Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a
Nonparametric statistics a step-by-step approach
Corder, Gregory W
2014-01-01
"…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory. It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
Nonparametric statistical structuring of knowledge systems using binary feature matches
Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue
2014-01-01
statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...
Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing
Hald, Ditte Høvenhoff
The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...
Using Mathematica to build Non-parametric Statistical Tables
Gloria Perez Sainz de Rozas
2003-01-01
Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.
1st Conference of the International Society for Nonparametric Statistics
Lahiri, S; Politis, Dimitris
2014-01-01
This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world. The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...
Genomic breeding value estimation using nonparametric additive regression models
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
A non-parametric framework for estimating threshold limit values
Ulm Kurt
2005-11-01
Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.
Using non-parametric methods in econometric production analysis
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods
Pedro Carvalho
2016-03-01
Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.
Empirical Likelihood Confidence Intervals for the Differences of Quantiles with Missing Data
Yong-song Qin; Yong-jiang Qian
2009-01-01
Suppose that there are two nonparametric populations x and y with missing data on both of them.We are interested in constructing confidence intervals on the quantile differences of x and y.Random imputation is used.Empirical likelihood confidence intervals on the differences are constructed.
Effraimidis, Georgios; Dahl, Christian Møller
In this paper, we develop a fully nonparametric approach for the estimation of the cumulative incidence function with Missing At Random right-censored competing risks data. We obtain results on the pointwise asymptotic normality as well as the uniform convergence rate of the proposed nonparametric...... estimator. A simulation study that serves two purposes is provided. First, it illustrates in details how to implement our proposed nonparametric estimator. Secondly, it facilitates a comparison of the nonparametric estimator to a parametric counterpart based on the estimator of Lu and Liang (2008...
Mercau, Ezequiel
The Falklands War was shrouded in symbolism and permeated with imperial rhetoric, imagery and memories, bringing to the fore divergent conceptions of Britishness, kinship and belonging. The current dispute, moreover, is frequently framed in terms of national identity, and accusations of imperialism...... and neo-colonialism persist. Thus in dealing with the conflict, historians and commentators alike have often made references to imperial legacies, yet this has rarely been afforded much careful analysis. Views on this matter continue to be entrenched, either presenting the war as a throwback to nineteenth...... from the legacies of empire. Taking decolonization as a starting point, this thesis demonstrates how the idea of a ‘British world’ gained a new lease of life vis-à-vis the Falklands, as the Islanders adopted the rhetorical mantle of ‘abandoned Britons’. Yet this new momentum was partial and fractured...
Mercau, Ezequiel
from the legacies of empire. Taking decolonization as a starting point, this thesis demonstrates how the idea of a ‘British world’ gained a new lease of life vis-à-vis the Falklands, as the Islanders adopted the rhetorical mantle of ‘abandoned Britons’. Yet this new momentum was partial and fractured......The Falklands War was shrouded in symbolism and permeated with imperial rhetoric, imagery and memories, bringing to the fore divergent conceptions of Britishness, kinship and belonging. The current dispute, moreover, is frequently framed in terms of national identity, and accusations of imperialism......, as evinced by the developments triggered by the Argentine invasion in 1982. Despite the apparent firmness of the British government’s commitment to the Islands, cracks and fissures over the meaning of Britishness were simultaneously magnified. The perceived temporal dislocation of defending loyal Britons...
Variational estimation of the drift for stochastic differential equations from the empirical density
Batz, Philipp; Ruttor, Andreas; Opper, Manfred
2016-08-01
We present a method for the nonparametric estimation of the drift function of certain types of stochastic differential equations from the empirical density. It is based on a variational formulation of the Fokker-Planck equation. The minimization of an empirical estimate of the variational functional using kernel based regularization can be performed in closed form. We demonstrate the performance of the method on second order, Langevin-type equations and show how the method can be generalized to other noise models.
Variational estimation of the drift for stochastic differential equations from the empirical density
Batz, Philipp; Opper, Manfred
2016-01-01
We present a method for the nonparametric estimation of the drift function of certain types of stochastic differential equations from the empirical density. It is based on a variational formulation of the Fokker-Planck equation. The minimization of an empirical estimate of the variational functional using kernel based regularization can be performed in closed form. We demonstrate the performance of the method on second order, Langevin-type equations and show how the method can be generalized to other noise models.
Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi
2014-04-01
Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.
Comparison of non-parametric methods for ungrouping coarsely aggregated data
Silvia Rizzi
2016-05-01
Full Text Available Abstract Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. Results The methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.
A Non-parametric Approach to the Overall Estimate of Cognitive Load Using NIRS Time Series.
Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Ishiguro, Hiroshi
2017-01-01
We present a non-parametric approach to prediction of the n-back n ∈ {1, 2} task as a proxy measure of mental workload using Near Infrared Spectroscopy (NIRS) data. In particular, we focus on measuring the mental workload through hemodynamic responses in the brain induced by these tasks, thereby realizing the potential that they can offer for their detection in real world scenarios (e.g., difficulty of a conversation). Our approach takes advantage of intrinsic linearity that is inherent in the components of the NIRS time series to adopt a one-step regression strategy. We demonstrate the correctness of our approach through its mathematical analysis. Furthermore, we study the performance of our model in an inter-subject setting in contrast with state-of-the-art techniques in the literature to show a significant improvement on prediction of these tasks (82.50 and 86.40% for female and male participants, respectively). Moreover, our empirical analysis suggest a gender difference effect on the performance of the classifiers (with male data exhibiting a higher non-linearity) along with the left-lateralized activation in both genders with higher specificity in females.
Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines
Kuter, S.; Akyürek, Z.; Weber, G.-W.
2016-10-01
Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th
Casco Bay lies at the heart of Maine's most populated area. The health of its waters, wetlands, and wildlife depend in large part on the activities of the quarter-million residents who live in its watershed. Less than 30 years ago, portions of Casco Bay were off-limits to recr...
Engholm, Ida
2014-01-01
Celebrated as one of the leading and most valuable brands in the world, eBay has acquired iconic status on par with century-old brands such as Coca-Cola and Disney. The eBay logo is now synonymous with the world’s leading online auction website, and its design is associated with the company...
Vale, A
2007-01-01
We revisit the longstanding question of whether first brightest cluster galaxies are statistically drawn from the same distribution as other cluster galaxies or are "special", using the new non-parametric, empirically based model presented in Vale&Ostriker (2006) for associating galaxy luminosity with halo/subhalo masses. We introduce scatter in galaxy luminosity at fixed halo mass into this model, building a conditional luminosity function (CLF) by considering two possible models: a simple lognormal and a model based on the distribution of concentration in haloes of a given mass. We show that this model naturally allows an identification of halo/subhalo systems with groups and clusters of galaxies, giving rise to a clear central/satellite galaxy distinction. We then use these results to build up the dependence of brightest cluster galaxy (BCG) magnitudes on cluster luminosity, focusing on two statistical indicators, the dispersion in BCG magnitude and the magnitude difference between first and second bri...
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
A Non-Parametric Spatial Independence Test Using Symbolic Entropy
López Hernández, Fernando
2008-01-01
Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Analyzing multiple spike trains with nonparametric Granger causality.
Nedungadi, Aatira G; Rangarajan, Govindan; Jain, Neeraj; Ding, Mingzhou
2009-08-01
Simultaneous recordings of spike trains from multiple single neurons are becoming commonplace. Understanding the interaction patterns among these spike trains remains a key research area. A question of interest is the evaluation of information flow between neurons through the analysis of whether one spike train exerts causal influence on another. For continuous-valued time series data, Granger causality has proven an effective method for this purpose. However, the basis for Granger causality estimation is autoregressive data modeling, which is not directly applicable to spike trains. Various filtering options distort the properties of spike trains as point processes. Here we propose a new nonparametric approach to estimate Granger causality directly from the Fourier transforms of spike train data. We validate the method on synthetic spike trains generated by model networks of neurons with known connectivity patterns and then apply it to neurons simultaneously recorded from the thalamus and the primary somatosensory cortex of a squirrel monkey undergoing tactile stimulation.
Prior processes and their applications nonparametric Bayesian estimation
Phadia, Eswar G
2016-01-01
This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...
Using non-parametric methods in econometric production analysis
Czekaj, Tomasz Gerard; Henningsen, Arne
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Nonparametric Estimation of Distributions in Random Effects Models
Hart, Jeffrey D.
2011-01-01
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.
Curve registration by nonparametric goodness-of-fit testing
Dalalyan, Arnak
2011-01-01
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the p-value of the chi squared test. We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 1. Some numerical experiments on synthetic datasets are reported as well.
Nonparametric forecasting of low-dimensional dynamical systems.
Berry, Tyrus; Giannakis, Dimitrios; Harlim, John
2015-03-01
This paper presents a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. The key idea is to represent the discrete shift maps on a smooth basis which can be obtained by the diffusion maps algorithm. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution to the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to quantify uncertainties (in fact, evolve the probability distribution) for nontrivial dynamical systems with equation-free modeling. We verify our approach on various examples, ranging from an inhomogeneous anisotropic stochastic differential equation on a torus, the chaotic Lorenz three-dimensional model, and the Niño-3.4 data set which is used as a proxy of the El Niño Southern Oscillation.
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.
Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z
2017-03-01
A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees
Savic Vladimir
2010-01-01
Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.
Revealing components of the galaxy population through nonparametric techniques
Bamford, Steven P; Nichol, Robert C; Miller, Christopher J; Wasserman, Larry; Genovese, Christopher R; Freeman, Peter E
2008-01-01
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
Multi-Directional Non-Parametric Analysis of Agricultural Efficiency
Balezentis, Tomas
This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Nonparametric reconstruction of the Om diagnostic to test LCDM
Escamilla-Rivera, Celia
2015-01-01
Cosmic acceleration is usually related with the unknown dark energy, which equation of state, w(z), is constrained and numerically confronted with independent astrophysical data. In order to make a diagnostic of w(z), the introduction of a null test of dark energy can be done using a diagnostic function of redshift, Om. In this work we present a nonparametric reconstruction of this diagnostic using the so-called Loess-Simex factory to test the concordance model with the advantage that this approach offers an alternative way to relax the use of priors and find a possible 'w' that reliably describe the data with no previous knowledge of a cosmological model. Our results demonstrate that the method applied to the dynamical Om diagnostic finds a preference for a dark energy model with equation of state w =-2/3, which correspond to a static domain wall network.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Equity and efficiency in private and public education: a nonparametric comparison
L. Cherchye; K. de Witte; E. Ooghe; I. Nicaise
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W
2009-02-01
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.
Non-parametric tests of productive efficiency with errors-in-variables
Kuosmanen, T.K.; Post, T.; Scholtes, S.
2007-01-01
We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans
Equity and efficiency in private and public education: a nonparametric comparison
Cherchye, L.; de Witte, K.; Ooghe, E.; Nicaise, I.
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Semi-parametric regression: Efficiency gains from modeling the nonparametric part
Yu, Kyusang; Park, Byeong U; 10.3150/10-BEJ296
2011-01-01
It is widely admitted that structured nonparametric modeling that circumvents the curse of dimensionality is important in nonparametric estimation. In this paper we show that the same holds for semi-parametric estimation. We argue that estimation of the parametric component of a semi-parametric model can be improved essentially when more structure is put into the nonparametric part of the model. We illustrate this for the partially linear model, and investigate efficiency gains when the nonparametric part of the model has an additive structure. We present the semi-parametric Fisher information bound for estimating the parametric part of the partially linear additive model and provide semi-parametric efficient estimators for which we use a smooth backfitting technique to deal with the additive nonparametric part. We also present the finite sample performances of the proposed estimators and analyze Boston housing data as an illustration.
Begum, Mehmuna; Sahu, Biraja Kumar; Das, Apurba Kumar; Vinithkumar, Nambali Valsalan; Kirubagaran, Ramalingam
2015-05-01
Blooming of diatom species Chaetoceros curvisetus (Cleve, 1889) was observed in Junglighat Bay and Haddo Harbour of Port Blair Bay of Andaman and Nicobar Islands during June 2010. Physico-chemical parameters, nutrient concentrations and phytoplankton composition data collected from five stations during 2010 were classified as bloom area (BA) and non-bloom area (NBA) and compared. Elevated values of dissolved oxygen were recorded in the BA, and it significantly varied (p NBA. Among the nutrient parameters studied, nitrate concentration indicated significant variation in BA and NBA (p NBA, indicating its utilization. In Junglighat Bay, the C. curvisetus species constituted 93.4 and 69.2% composition of total phytoplankton population during day 1 and day 2, respectively. The bloom forming stations separated out from the non-bloom forming station in non-parametric multidimensional scaling (nMDS) ordinations; cluster analysis powered by SIMPROF test also grouped the stations as BA and NBA.
Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.
Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:
Empirical likelihood for balanced ranked-set sampled data
2009-01-01
Ranked-set sampling(RSS) often provides more efficient inference than simple random sampling(SRS).In this article,we propose a systematic nonparametric technique,RSS-EL,for hypoth-esis testing and interval estimation with balanced RSS data using empirical likelihood(EL).We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations.In all three cases,RSS is shown to provide more efficient inference than SRS of the same size.Moreover,the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data,such as perfect ranking,identical ranking scheme in two groups,and location shift between two population distributions.The merit of the RSS-EL method is also demonstrated through simulation studies.
Biscayne Bay Alongshore Epifauna
National Oceanic and Atmospheric Administration, Department of Commerce — Field studies to characterize the alongshore epifauna (shrimp, crabs, echinoderms, and small fishes) along the western shore of southern Biscayne Bay were started in...
National Oceanic and Atmospheric Administration, Department of Commerce — This image represents a 4x4 meter resolution bathymetric surface for Jobos Bay, Puerto Rico (in NAD83 UTM 19 North). The depth values are in meters referenced to the...
Hammond Bay Biological Station
Federal Laboratory Consortium — Hammond Bay Biological Station (HBBS), located near Millersburg, Michigan, is a field station of the USGS Great Lakes Science Center (GLSC). HBBS was established by...
Chesapeake Bay Tributary Strategies
Chesapeake Bay Tributary Strategies were developed by the seven watershed jurisdictions and outlined the river basin-specific implementation activities to reduce nutrient and sediment pollutant loads from point and nonpoint sources.
National Oceanic and Atmospheric Administration, Department of Commerce — This data set consists of 0.5-meter pixel resolution, four band orthoimages covering the Humboldt Bay area. An orthoimage is remotely sensed image data in which...
Engholm, Ida
2014-01-01
Celebrated as one of the leading and most valuable brands in the world, eBay has acquired iconic status on par with century-old brands such as Coca-Cola and Disney. The eBay logo is now synonymous with the world’s leading online auction website, and its design is associated with the company......’s purpose: selling millions of goods, some of which are ‘designer’ items and some of which are considered design icons....
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
Nonparametric identification of structural modifications in Laplace domain
Suwała, G.; Jankowski, Ł.
2017-02-01
This paper proposes and experimentally verifies a Laplace-domain method for identification of structural modifications, which (1) unlike time-domain formulations, allows the identification to be focused on these parts of the frequency spectrum that have a high signal-to-noise ratio, and (2) unlike frequency-domain formulations, decreases the influence of numerical artifacts related to the particular choice of the FFT exponential window decay. In comparison to the time-domain approach proposed earlier, advantages of the proposed method are smaller computational cost and higher accuracy, which leads to reliable performance in more difficult identification cases. Analytical formulas for the first- and second-order sensitivity analysis are derived. The approach is based on a reduced nonparametric model, which has the form of a set of selected structural impulse responses. Such a model can be collected purely experimentally, which obviates the need for design and laborious updating of a parametric model, such as a finite element model. The approach is verified experimentally using a 26-node lab 3D truss structure and 30 identification cases of a single mass modification or two concurrent mass modifications.
A New Non-Parametric Approach to Galaxy Morphological Classification
Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero
2003-01-01
We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...
Adaptive Neural Network Nonparametric Identifier With Normalized Learning Laws.
Chairez, Isaac
2016-04-05
This paper addresses the design of a normalized convergent learning law for neural networks (NNs) with continuous dynamics. The NN is used here to obtain a nonparametric model for uncertain systems described by a set of ordinary differential equations. The source of uncertainties is the presence of some external perturbations and poor knowledge of the nonlinear function describing the system dynamics. A new adaptive algorithm based on normalized algorithms was used to adjust the weights of the NN. The adaptive algorithm was derived by means of a nonstandard logarithmic Lyapunov function (LLF). Two identifiers were designed using two variations of LLFs leading to a normalized learning law for the first identifier and a variable gain normalized learning law. In the case of the second identifier, the inclusion of normalized learning laws yields to reduce the size of the convergence region obtained as solution of the practical stability analysis. On the other hand, the velocity of convergence for the learning laws depends on the norm of errors in inverse form. This fact avoids the peaking transient behavior in the time evolution of weights that accelerates the convergence of identification error. A numerical example demonstrates the improvements achieved by the algorithm introduced in this paper compared with classical schemes with no-normalized continuous learning methods. A comparison of the identification performance achieved by the no-normalized identifier and the ones developed in this paper shows the benefits of the learning law proposed in this paper.
Nonparametric estimation of quantum states, processes and measurements
Lougovski, Pavel; Bennink, Ryan
Quantum state, process, and measurement estimation methods traditionally use parametric models, in which the number and role of relevant parameters is assumed to be known. When such an assumption cannot be justified, a common approach in many disciplines is to fit the experimental data to multiple models with different sets of parameters and utilize an information criterion to select the best fitting model. However, it is not always possible to assume a model with a finite (countable) number of parameters. This typically happens when there are unobserved variables that stem from hidden correlations that can only be unveiled after collecting experimental data. How does one perform quantum characterization in this situation? We present a novel nonparametric method of experimental quantum system characterization based on the Dirichlet Process (DP) that addresses this problem. Using DP as a prior in conjunction with Bayesian estimation methods allows us to increase model complexity (number of parameters) adaptively as the number of experimental observations grows. We illustrate our approach for the one-qubit case and show how a probability density function for an unknown quantum process can be estimated.
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Non-parametric and least squares Langley plot methods
P. W. Kiedron
2015-04-01
Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.
Empirical Likelihood Method for Quantiles with Response Data Missing at Random
Xia-yan LI; Jun-qing YUAN
2012-01-01
Empirical likelihood is a nonparametric method for constructing confidence intervals and tests,notably in enabling the shape of a confidence region determined by the sample data.This paper presents a new version of the empirical likelihood method for quantiles under kernel regression imputation to adapt missing response data.It eliminates the need to solve nonlinear equations,and it is easy to apply.We also consider exponential empirical likelihood as an alternative method. Numerical results are presented to compare our method with others.
Torczynski, John R.
2001-02-27
A module bay requires less cleanroom airflow. A shaped gas inlet passage can allow cleanroom air into the module bay with flow velocity preferentially directed toward contaminant rich portions of a processing module in the module bay. Preferential gas flow direction can more efficiently purge contaminants from appropriate portions of the module bay, allowing a reduced cleanroom air flow rate for contaminant removal. A shelf extending from an air inlet slit in one wall of a module bay can direct air flowing therethrough toward contaminant-rich portions of the module bay, such as a junction between a lid and base of a processing module.
33 CFR 100.124 - Maggie Fischer Memorial Great South Bay Cross Bay Swim, Great South Bay, New York.
2010-07-01
... South Bay Cross Bay Swim, Great South Bay, New York. 100.124 Section 100.124 Navigation and Navigable... NAVIGABLE WATERS § 100.124 Maggie Fischer Memorial Great South Bay Cross Bay Swim, Great South Bay, New York... swimmer or safety craft on the swim event race course bounded by the following points: Starting Point...
Generalized empirical likelihood methods for analyzing longitudinal data
Wang, S.
2010-02-16
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compare them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods which ignore the correlation structure, and better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is also presented.
LINGNeng-xiang; DUXue-qiao
2005-01-01
In this paper, we study the strong consistency for partitioning estimation of regression function under samples that axe φ-mixing sequences with identically distribution.Key words: nonparametric regression function; partitioning estimation; strong convergence;φ-mixing sequences.
Kernel bandwidth estimation for non-parametric density estimation: a comparative study
Van der Walt, CM
2013-12-01
Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
Nonparametric Monitoring for Geotechnical Structures Subject to Long-Term Environmental Change
Hae-Bum Yun
2011-01-01
Full Text Available A nonparametric, data-driven methodology of monitoring for geotechnical structures subject to long-term environmental change is discussed. Avoiding physical assumptions or excessive simplification of the monitored structures, the nonparametric monitoring methodology presented in this paper provides reliable performance-related information particularly when the collection of sensor data is limited. For the validation of the nonparametric methodology, a field case study was performed using a full-scale retaining wall, which had been monitored for three years using three tilt gauges. Using the very limited sensor data, it is demonstrated that important performance-related information, such as drainage performance and sensor damage, could be disentangled from significant daily, seasonal and multiyear environmental variations. Extensive literature review on recent developments of parametric and nonparametric data processing techniques for geotechnical applications is also presented.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
Fan, Jianqing; Song, Rui
2011-01-01
A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...
Nonparametric TOA estimators for low-resolution IR-UWB digital receiver
Yanlong Zhang; Weidong Chen
2015-01-01
Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Empirical Measurement of the Software Testing and Reliability
Zou Feng-zhong; Xu Ren-zuo
2004-01-01
The meanings of parameters of software reliability models are investigated in terms of the process of the software testing and in terms of other measurements of software. Based on the investigation, the empirical estimation of the parameters is addressed. On one hand, these empirical estimates are also measurements of the software, which can be used to control and to optimize the process of the software development. On the other hand, by treating these empirical estimates as Bayes priors, software reliability models are extended such that the engineers' experience can be integrated into and hence to improve the models.
Examples of the Application of Nonparametric Information Geometry to Statistical Physics
Giovanni Pistone
2013-09-01
Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2008-01-01
Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Non-parametric combination and related permutation tests for neuroimaging.
Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E
2016-04-01
In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
D'Agostini, G
2005-01-01
It is curious to learn that Enrico Fermi knew how to base probabilistic inference on Bayes theorem, and that some influential notes on statistics for physicists stem from what the author calls elsewhere, but never in these notes, {\\it the Bayes Theorem of Fermi}. The fact is curious because the large majority of living physicists, educated in the second half of last century -- a kind of middle age in the statistical reasoning -- never heard of Bayes theorem during their studies, though they have been constantly using an intuitive reasoning quite Bayesian in spirit. This paper is based on recollections and notes by Jay Orear and on Gauss' ``Theoria motus corporum coelestium'', being the {\\it Princeps mathematicorum} remembered by Orear as source of Fermi's Bayesian reasoning.
Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea
2016-01-01
In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used
Richards Bay effluent pipeline
Lord, DA
1986-07-01
Full Text Available This report discusses the adequate provision for waste disposal is an essential part of the infrastructure needed in the development of Richards Bay as a deepwater harbour and industrial/metropolitan area. Having considered various options for waste...
Bayes Empirical Bayes Inference of Amino Acid Sites Under Positive Selection
Yang, Ziheng; Wong, Wendy Shuk Wan; Nielsen, Rasmus
2005-01-01
Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (dN/dS, denoted ) is used as a measure of selective pressure at the protein level...
The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics.
Brown, Jeremy M; Lemmon, Alan R
2007-08-01
As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across-class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator
Huang, Shengzhi; Huang, Qiang; Leng, Guoyong; Chang, Jianxia
2016-06-01
This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar to Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.
Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions
Poczos, Barnabas; Schneider, Jeff
2012-01-01
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.
Rights, Jason D; Sterba, Sonya K
2016-11-01
Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.
THE RESPONSE OF MONTEREY BAY TO THE 2010 CHILEAN EARTHQUAKE
Laurence C. Breaker
2011-01-01
Full Text Available The primary frequencies contained in the arrival sequence produced by the tsunami from the Chilean earthquake of 2010 in Monterey Bay were extracted to determine the seiche modes that were produced. Singular Spectrum Analysis (SSA and Ensemble Empirical Mode Decomposition (EEMD were employed to extract the primary frequencies of interest. The wave train from the Chilean tsunami lasted for at least four days due to multipath arrivals that may not have included reflections from outside the bay but most likely did include secondary undulations, and energy trapping in the form of edge waves, inside the bay. The SSA decomposition resolved oscillations with periods of 52-57, 34-35, 26-27, and 21-22 minutes, all frequencies that have been predicted and/or observed in previous studies. The EEMD decomposition detected oscillations with periods of 50-55 and 21-22 minutes. Periods in the range of 50-57 minutes varied due to measurement uncertainties but almost certainly correspond to the first longitudinal mode of oscillation for Monterey Bay, periods of 34-35 minutes correspond to the first transverse mode of oscillation that assumes a nodal line across the entrance of the bay, a period of 26- 27 minutes, although previously observed, may not represent a fundamental oscillation, and a period of 21-22 minutes has been predicted and observed previously. A period of ~37 minutes, close to the period of 34-35 minutes, was generated by the Great Alaskan Earthquake of 1964 in Monterey Bay and most likely represents the same mode of oscillation. The tsunamis associated with the Great Alaskan Earthquake and the Chilean Earthquake both entered Monterey Bay but initially arrived outside the bay from opposite directions. Unlike the Great Alaskan Earthquake, however, which excited only one resonant mode inside the bay, the Chilean Earthquake excited several modes suggesting that the asymmetric shape of the entrance to Monterey Bay was an important factor and that the
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Non-parametric seismic hazard analysis in the presence of incomplete data
Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush
2017-01-01
The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Rabia Ece OMAY
2013-06-01
Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.
Zhao, Zhibiao
2011-06-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Parametrically guided estimation in nonparametric varying coefficient models with quasi-likelihood.
Davenport, Clemontina A; Maity, Arnab; Wu, Yichao
2015-04-01
Varying coefficient models allow us to generalize standard linear regression models to incorporate complex covariate effects by modeling the regression coefficients as functions of another covariate. For nonparametric varying coefficients, we can borrow the idea of parametrically guided estimation to improve asymptotic bias. In this paper, we develop a guided estimation procedure for the nonparametric varying coefficient models. Asymptotic properties are established for the guided estimators and a method of bandwidth selection via bias-variance tradeoff is proposed. We compare the performance of the guided estimator with that of the unguided estimator via both simulation and real data examples.
Koutsourelakis, P
2007-05-03
Clustering represents one of the most common statistical procedures and a standard tool for pattern discovery and dimension reduction. Most often the objects to be clustered are described by a set of measurements or observables e.g. the coordinates of the vectors, the attributes of people. In a lot of cases however the available observations appear in the form of links or connections (e.g. communication or transaction networks). This data contains valuable information that can in general be exploited in order to discover groups and better understand the structure of the dataset. Since in most real-world datasets, several of these links are missing, it is also useful to develop procedures that can predict those unobserved connections. In this report we address the problem of unsupervised group discovery in relational datasets. A fundamental issue in all clustering problems is that the actual number of clusters is unknown a priori. In most cases this is addressed by running the model several times assuming a different number of clusters each time and selecting the value that provides the best fit based on some criterion (ie Bayes factor in the case of Bayesian techniques). It is easily understood that it would be preferable to develop techniques that are able to number of clusters is essentially learned from that data along with the rest of model parameters. For that purpose, we adopt a nonparametric Bayesian framework which provides a very flexible modeling environment in which the size of the model i.e. the number of clusters, can adapt to the available data and readily accommodate outliers. The latter is particularly important since several groups of interest might consist of a small number of members and would most likely be smeared out by traditional modeling techniques. Finally, the proposed framework combines all the advantages of standard Bayesian techniques such as integration of prior knowledge in a principled manner, seamless accommodation of missing data
California Department of Resources — The Bay Trail provides easily accessible recreational opportunities for outdoor enthusiasts, including hikers, joggers, bicyclists and skaters. It also offers a...
California Department of Resources — The Bay Trail provides easily accessible recreational opportunities for outdoor enthusiasts, including hikers, joggers, bicyclists and skaters. It also offers a...
Empirical Test Case Specification
Kalyanova, Olena; Heiselberg, Per
This document includes the empirical specification on the IEA task of evaluation building energy simulation computer programs for the Double Skin Facades (DSF) constructions. There are two approaches involved into this procedure, one is the comparative approach and another is the empirical one. I....... In the comparative approach the outcomes of different software tools are compared, while in the empirical approach the modelling results are compared with the results of experimental test cases....
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any func
赵文芝; 田铮; 夏志明
2009-01-01
A wavelet method of detection and estimation of change points in nonparametric regression models under random design is proposed.The confidence bound of our test is derived by using the test statistics based on empirical wavelet coefficients as obtained by wavelet transformation of the data which is observed with noise.Moreover,the consistence of the test is proved while the rate of convergence is given.The method turns out to be effective after being tested on simulated examples and applied to IBM stock market data.
Nonparametric estimation of population density for line transect sampling using FOURIER series
Crain, B.R.; Burnham, K.P.; Anderson, D.R.; Lake, J.L.
1979-01-01
A nonparametric, robust density estimation method is explored for the analysis of right-angle distances from a transect line to the objects sighted. The method is based on the FOURIER series expansion of a probability density function over an interval. With only mild assumptions, a general population density estimator of wide applicability is obtained.
A non-parametric peak finder algorithm and its application in searches for new physics
Chekanov, S
2011-01-01
We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.
Testing a parametric function against a nonparametric alternative in IV and GMM settings
Gørgens, Tue; Wurtz, Allan
This paper develops a specification test for functional form for models identified by moment restrictions, including IV and GMM settings. The general framework is one where the moment restrictions are specified as functions of data, a finite-dimensional parameter vector, and a nonparametric real...
Non-parametric Bayesian graph models reveal community structure in resting state fMRI
Andersen, Kasper Winther; Madsen, Kristoffer H.; Siebner, Hartwig Roman
2014-01-01
Modeling of resting state functional magnetic resonance imaging (rs-fMRI) data using network models is of increasing interest. It is often desirable to group nodes into clusters to interpret the communication patterns between nodes. In this study we consider three different nonparametric Bayesian...
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-parametric system identification from non-linear stochastic response
Rüdinger, Finn; Krenk, Steen
2001-01-01
An estimation method is proposed for identification of non-linear stiffness and damping of single-degree-of-freedom systems under stationary white noise excitation. Non-parametric estimates of the stiffness and damping along with an estimate of the white noise intensity are obtained by suitable p...
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian u...
Jang, Eunice Eunhee; Roussos, Louis
2007-01-01
This article reports two studies to illustrate methodologies for conducting a conditional covariance-based nonparametric dimensionality assessment using data from two forms of the Test of English as a Foreign Language (TOEFL). Study 1 illustrates how to assess overall dimensionality of the TOEFL including all three subtests. Study 2 is aimed at…
Wei, Jiawei
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Comparison of reliability techniques of parametric and non-parametric method
C. Kalaiselvan
2016-06-01
Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.
Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach
Boiko, Igor
2013-01-01
The relay feedback test (RFT) has become a popular and efficient tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text. Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
A non-parametric method for correction of global radiation observations
Bacher, Peder; Madsen, Henrik; Perers, Bengt;
2013-01-01
in the observations are corrected. These are errors such as: tilt in the leveling of the sensor, shadowing from surrounding objects, clipping and saturation in the signal processing, and errors from dirt and wear. The method is based on a statistical non-parametric clear-sky model which is applied to both...
Nonparametric estimation in an "illness-death" model when all transition times are interval censored
Frydman, Halina; Gerds, Thomas; Grøn, Randi
2013-01-01
We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states {0,1,2} from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times ...
A Comparison of Shewhart Control Charts based on Normality, Nonparametrics, and Extreme-Value Theory
Ion, R.A.; Does, R.J.M.M.; Klaassen, C.A.J.
2000-01-01
Several control charts for individual observations are compared. The traditional ones are the well-known Shewhart control charts with estimators for the spread based on the sample standard deviation and the average of the moving ranges. The alternatives are nonparametric control charts, based on emp
An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.
2014-01-01
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Agasisti, Tommaso
2011-01-01
The objective of this paper is an efficiency analysis concerning higher education systems in European countries. Data have been extracted from OECD data-sets (Education at a Glance, several years), using a non-parametric technique--data envelopment analysis--to calculate efficiency scores. This paper represents the first attempt to conduct such an…
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
Fan, Jianqing; Feng, Yang; Song, Rui
2011-06-01
A variable screening procedure via correlation learning was proposed in Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under general nonparametric models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, a data-driven thresholding and an iterative nonparametric independence screening (INIS) are also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data analysis demonstrate that the proposed procedure works well with moderate sample size and large dimension and performs better than competing methods.
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models.
Fan, Jianqing; Ma, Yunbei; Dai, Wei
2014-01-01
The varying-coefficient model is an important class of nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is large, the issue of variable selection arises. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in sparse ultra-high dimensional varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance the practical utility and finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.
Low default credit scoring using two-class non-parametric kernel density estimation
Rademeyer, E
2016-12-01
Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...
Do Former College Athletes Earn More at Work? A Nonparametric Assessment
Henderson, Daniel J.; Olbrecht, Alexandre; Polachek, Solomon W.
2006-01-01
This paper investigates how students' collegiate athletic participation affects their subsequent labor market success. By using newly developed techniques in nonparametric regression, it shows that on average former college athletes earn a wage premium. However, the premium is not uniform, but skewed so that more than half the athletes actually…
Nonparametric Tests of Collectively Rational Consumption Behavior : An Integer Programming Procedure
Cherchye, L.J.H.; de Rock, B.; Sabbe, J.; Vermeulen, F.M.P.
2008-01-01
We present an IP-based nonparametric (revealed preference) testing proce- dure for rational consumption behavior in terms of general collective models, which include consumption externalities and public consumption. An empiri- cal application to data drawn from the Russia Longitudinal Monitoring
Non-Parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
Høg, Esben
2003-01-01
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean--reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Non-parametric Estimation of Diffusion-Paths Using Wavelet Scaling Methods
Høg, Esben
In continuous time, diffusion processes have been used for modelling financial dynamics for a long time. For example the Ornstein-Uhlenbeck process (the simplest mean-reverting process) has been used to model non-speculative price processes. We discuss non--parametric estimation of these processes...
Wei, Jiawei; Carroll, Raymond J; Maity, Arnab
2011-07-01
We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.
Hanson, K.M.; Cunningham, G.S.
1996-04-01
The authors are developing a computer application, called the Bayes Inference Engine, to provide the means to make inferences about models of physical reality within a Bayesian framework. The construction of complex nonlinear models is achieved by a fully object-oriented design. The models are represented by a data-flow diagram that may be manipulated by the analyst through a graphical programming environment. Maximum a posteriori solutions are achieved using a general, gradient-based optimization algorithm. The application incorporates a new technique of estimating and visualizing the uncertainties in specific aspects of the model.
Rasch, Astrid
metropole, political legitimacy at the end of empire and settler family innocence, I argue that the writers engage with collective memories about empire in their personal recollections. Such collective narratives pattern autobiographies so that the same concerns and rhetoric recur in widely different...
Linnet, Kristian
2005-01-01
Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors......Bootstrap, HPLC, limit of blank, limit of detection, non-parametric statistics, type I and II errors...
Empirical Philosophy of Science
Mansnerus, Erika; Wagenknecht, Susann
2015-01-01
knowledge takes place through the integration of the empirical or historical research into the philosophical studies, as Chang, Nersessian, Thagard and Schickore argue in their work. Building upon their contributions we will develop a blueprint for an Empirical Philosophy of Science that draws upon......Empirical insights are proven fruitful for the advancement of Philosophy of Science, but the integration of philosophical concepts and empirical data poses considerable methodological challenges. Debates in Integrated History and Philosophy of Science suggest that the advancement of philosophical...... qualitative methods from the social sciences in order to advance our philosophical understanding of science in practice. We will regard the relationship between philosophical conceptualization and empirical data as an iterative dialogue between theory and data, which is guided by a particular ‘feeling with...
Bayes Multiple Decision Functions
Wu, Wensong
2011-01-01
This paper deals with the problem of simultaneously making many (M) binary decisions based on one realization of a random data matrix X. M is typically large and X will usually have M rows associated with each of the M decisions to make, but for each row the data may be low dimensional. A Bayesian decision-theoretic approach for this problem is implemented with the overall loss function being a cost-weighted linear combination of Type I and Type II loss functions. The class of loss functions considered allows for the use of the false discovery rate (FDR), false nondiscovery rate (FNR), and missed discovery rate (MDR) in assessing the decision. Through this Bayesian paradigm, the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain the optimal Bayes action is described. In contrast to many works in the literature where the rows of the matrix X are assumed to be stochastically independent, we allow in this paper a dependent data structure with the associations obtained through...
2010-05-28
... Swim, Great South Bay, NY, in the Federal Register (74 FR 32428). We did not receive any comments or... published at 74 FR 32428 on July 8, 2009, is adopted as a final rule with the following changes: PART 100... South Bay Cross Bay Swim, Great South Bay, NY AGENCY: Coast Guard, DHS. ACTION: Final rule. SUMMARY:...
Bio-optical water quality dynamics observed from MERIS in Pensacola Bay, Florida
Observed bio-optical water quality data collected from 2009 to 2011 in Pensacola Bay, Florida were used to develop empirical remote sensing retrieval algorithms for chlorophyll a (Chla), colored dissolved organic matter (CDOM), and suspended particulate matter (SPM). Time-series ...
Bio-optical water quality dynamics observed from MERIS in Pensacola Bay, Florida
Observed bio-optical water quality data collected from 2009 to 2011 in Pensacola Bay, Florida were used to develop empirical remote sensing retrieval algorithms for chlorophyll a (Chla), colored dissolved organic matter (CDOM), and suspended particulate matter (SPM). Time-series ...
Z. Nematollahi
2016-03-01
Full Text Available Introduction: Due to existence of the risk and uncertainty in agriculture, risk management is crucial for management in agriculture. Therefore the present study was designed to determine the risk aversion coefficient for Esfarayens farmers. Materials and Methods: The following approaches have been utilized to assess risk attitudes: (1 direct elicitation of utility functions, (2 experimental procedures in which individuals are presented with hypothetical questionnaires regarding risky alternatives with or without real payments and (3: Inference from observation of economic behavior. In this paper, we focused on approach (3: inference from observation of economic behavior, based on this assumption of existence of the relationship between the actual behavior of a decision maker and the behavior predicted from empirically specified models. A new non-parametric method and the QP method were used to calculate the coefficient of risk aversion. We maximized the decision maker expected utility with the E-V formulation (Freund, 1956. Ideally, in constructing a QP model, the variance-covariance matrix should be formed for each individual farmer. For this purpose, a sample of 100 farmers was selected using random sampling and their data about 14 products of years 2008- 2012 were assembled. The lowlands of Esfarayen were used since within this area, production possibilities are rather homogeneous. Results and Discussion: The results of this study showed that there was low correlation between some of the activities, which implies opportunities for income stabilization through diversification. With respect to transitory income, Ra, vary from 0.000006 to 0.000361 and the absolute coefficient of risk aversion in our sample were 0.00005. The estimated Ra values vary considerably from farm to farm. The results showed that the estimated Ra for the subsample existing of 'non-wealthy' farmers was 0.00010. The subsample with farmers in the 'wealthy' group had an
Chesapeake Bay: Introduction to an Ecosystem.
Environmental Protection Agency, Washington, DC.
The Chesapeake Bay is the largest estuary in the contiguous United States. The Bay and its tidal tributaries make up the Chesapeake Bay ecosystem. This document, which focuses of various aspects of this ecosystem, is divided into four major parts. The first part traces the geologic history of the Bay, describes the overall physical structure of…
Potential Inundation due to Rising Sea Levels in the San Francisco Bay Region
Knowles, Noah
2009-01-01
An increase in the rate of sea level rise is one of the primary impacts of projected global climate change. To assess potential inundation associated with a continued acceleration of sea level rise, the highest resolution elevation data available were assembled from various sources and mosaicked to cover the land surfaces of the San Francisco Bay region. Next, to quantify high water levels throughout the bay, a hydrodynamic model of the San Francisco Estuary was driven by a projection of hourly water levels at the Presidio. This projection was based on a combination of climate model outputs and empirical models and incorporates astronomical, storm surge, El Niño, and long-term sea level rise influences. Based on the resulting data, maps of areas vulnerable to inundation were produced, corresponding to specific amounts of sea level rise and recurrence intervals. These maps portray areas where inundation will likely be an increasing concern. In the North Bay, wetland survival and developed fill areas are at risk. In Central and South bays, a key feature is the bay-ward periphery of developed areas that would be newly vulnerable to inundation. Nearly all municipalities adjacent to South Bay face this risk to some degree. For the Bay as a whole, as early as 2050 under this scenario, the one-year peak event nearly equals the 100-year peak event in 2000. Maps of vulnerable areas are presented and some implications discussed.
Bayes multiple decision functions.
Wu, Wensong; Peña, Edsel A
2013-01-01
This paper deals with the problem of simultaneously making many (M) binary decisions based on one realization of a random data matrix X. M is typically large and X will usually have M rows associated with each of the M decisions to make, but for each row the data may be low dimensional. Such problems arise in many practical areas such as the biological and medical sciences, where the available dataset is from microarrays or other high-throughput technology and with the goal being to decide which among of many genes are relevant with respect to some phenotype of interest; in the engineering and reliability sciences; in astronomy; in education; and in business. A Bayesian decision-theoretic approach to this problem is implemented with the overall loss function being a cost-weighted linear combination of Type I and Type II loss functions. The class of loss functions considered allows for use of the false discovery rate (FDR), false nondiscovery rate (FNR), and missed discovery rate (MDR) in assessing the quality of decision. Through this Bayesian paradigm, the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain the optimal Bayes action is described. In contrast to many works in the literature where the rows of the matrix X are assumed to be stochastically independent, we allow a dependent data structure with the associations obtained through a class of frailty-induced Archimedean copulas. In particular, non-Gaussian dependent data structure, which is typical with failure-time data, can be entertained. The numerical implementation of the determination of the Bayes optimal action is facilitated through sequential Monte Carlo techniques. The theory developed could also be extended to the problem of multiple hypotheses testing, multiple classification and prediction, and high-dimensional variable selection. The proposed procedure is illustrated for the simple versus simple hypotheses setting and for the composite hypotheses setting
Theological reflections on empire
Allan A. Boesak
2009-11-01
Full Text Available Since the meeting of the World Alliance of Reformed Churches in Accra, Ghana (2004, and the adoption of the Accra Declaration, a debate has been raging in the churches about globalisation, socio-economic justice, ecological responsibility, political and cultural domination and globalised war. Central to this debate is the concept of empire and the way the United States is increasingly becoming its embodiment. Is the United States a global empire? This article argues that the United States has indeed become the expression of a modern empire and that this reality has considerable consequences, not just for global economics and politics but for theological refl ection as well.
Yates, K.K.; Cronin, T. M.; Crane, M.; Hansen, M.; Nayeghandi, A.; Swarzenski, P.; Edgar, T.; Brooks, G.R.; Suthard, B.; Hine, A.; Locker, S.; Willard, D.A.; Hastings, D.; Flower, B.; Hollander, D.; Larson, R.A.; Smith, K.
2007-01-01
Many of the nation's estuaries have been environmentally stressed since the turn of the 20th century and will continue to be impacted in the future. Tampa Bay, one the Gulf of Mexico's largest estuaries, exemplifies the threats that our estuaries face (EPA Report 2001, Tampa Bay Estuary Program-Comprehensive Conservation and Management Plan (TBEP-CCMP)). More than 2 million people live in the Tampa Bay watershed, and the population constitutes to grow. Demand for freshwater resources, conversion of undeveloped areas to resident and industrial uses, increases in storm-water runoff, and increased air pollution from urban and industrial sources are some of the known human activities that impact Tampa Bay. Beginning on 2001, additional anthropogenic modifications began in Tampa Bat including construction of an underwater gas pipeline and a desalinization plant, expansion of existing ports, and increased freshwater withdrawal from three major tributaries to the bay. In January of 2001, the Tampa Bay Estuary Program (TBEP) and its partners identifies a critical need for participation from the U.S. Geological Survey (USGS) in providing multidisciplinary expertise and a regional-scale, integrated science approach to address complex scientific research issue and critical scientific information gaps that are necessary for continued restoration and preservation of Tampa Bay. Tampa Bay stakeholders identified several critical science gaps for which USGS expertise was needed (Yates et al. 2001). These critical science gaps fall under four topical categories (or system components): 1) water and sediment quality, 2) hydrodynamics, 3) geology and geomorphology, and 4) ecosystem structure and function. Scientists and resource managers participating in Tampa Bay studies recognize that it is no longer sufficient to simply examine each of these estuarine system components individually, Rather, the interrelation among system components must be understood to develop conceptual and
Island Bay Wilderness study area : Island Bay National Wildlife Refuge
US Fish and Wildlife Service, Department of the Interior — This document is a brief report on a wilderness study area located in the Island Bay National Wildlife Refuge. It discusses the history of the study area, its...
Management case study: Tampa Bay, Florida
Morrison, Gerold; Greening, Holly; Yates, Kimberly K.; Wolanski, Eric; McLusky, Donald S.
2011-01-01
Tampa Bay, Florida, USA, is a shallow, subtropical estuary that experienced severe cultural eutrophication between the 1940s and 1980s, a period when the human population of its watershed quadrupled. In response, citizen action led to the formation of a public- and private-sector partnership (the Tampa Bay Estuary Program), which adopted a number of management objectives to support the restoration and protection of the bay’s living resources. These included numeric chlorophyll a and water-clarity targets, as well as long-term goals addressing the spatial extent of seagrasses and other selected habitat types, to support estuarine-dependent faunal guilds. Over the past three decades, nitrogen controls involving sources such as wastewater treatment plants, stormwater conveyance systems, fertilizer manufacturing and shipping operations, and power plants have been undertaken to meet these and other management objectives. Cumulatively, these controls have resulted in a 60% reduction in annual total nitrogen (TN) loads relative to earlier worse-case (latter 1970s) conditions. As a result, annual water-clarity and chlorophyll a targets are currently met in most years, and seagrass cover measured in 2008 was the highest recorded since 1950. Factors that have contributed to the observed improvements in Tampa Bay over the past several decades include the following: (1) Development of numeric, science-based water-quality targets to meet a long-term goal of restoring seagrass acreage to 1950s levels. Empirical and mechanistic models found that annual average chlorophyll a concentrations were a primary manageable factor affecting light attenuation. The models also quantified relationships between TN loads, chlorophyll a concentrations, light attenuation, and fluctuations in seagrass cover. The availability of long-term monitoring data, and a systematic process for using the data to evaluate the effectiveness of management actions, has allowed managers to track progress and
Gagie, Travis
2007-01-01
We trace the history of empirical entropy, touching briefly on its relation to Markov processes, normal numbers, Shannon entropy, the Chomsky hierarchy, Kolmogorov complexity, Ziv-Lempel compression, de Bruijn sequences and stochastic complexity.
Li, Ming; Gardiner, Joseph C; Breslau, Naomi; Anthony, James C; Lu, Qing
2014-07-01
Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
Non-parametric Bayesian human motion recognition using a single MEMS tri-axial accelerometer.
Ahmed, M Ejaz; Song, Ju Bin
2012-09-27
In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM) and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM) technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Non-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer
M. Ejaz Ahmed
2012-09-01
Full Text Available In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Applications of non-parametric statistics and analysis of variance on sample variances
Myers, R. H.
1981-01-01
Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.
Testing the Non-Parametric Conditional CAPM in the Brazilian Stock Market
Daniel Reed Bergmann
2014-04-01
Full Text Available This paper seeks to analyze if the variations of returns and systematic risks from Brazilian portfolios could be explained by the nonparametric conditional Capital Asset Pricing Model (CAPM by Wang (2002. There are four informational variables available to the investors: (i the Brazilian industrial production level; (ii the broad money supply M4; (iii the inflation represented by the Índice de Preços ao Consumidor Amplo (IPCA; and (iv the real-dollar exchange rate, obtained by PTAX dollar quotation.This study comprised the shares listed in the BOVESPA throughout January 2002 to December 2009. The test methodology developed by Wang (2002 and retorted to the Mexican context by Castillo-Spíndola (2006 was used. The observed results indicate that the nonparametric conditional model is relevant in explaining the portfolios’ returns of the sample considered for two among the four tested variables, M4 and PTAX dollar at 5% level of significance.
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Aaditya Ramdas
2017-01-01
Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Mustafa Koroglu
2016-02-01
Full Text Available This paper considers a functional-coefficient spatial Durbin model with nonparametric spatial weights. Applying the series approximation method, we estimate the unknown functional coefficients and spatial weighting functions via a nonparametric two-stage least squares (or 2SLS estimation method. To further improve estimation accuracy, we also construct a second-step estimator of the unknown functional coefficients by a local linear regression approach. Some Monte Carlo simulation results are reported to assess the finite sample performance of our proposed estimators. We then apply the proposed model to re-examine national economic growth by augmenting the conventional Solow economic growth convergence model with unknown spatial interactive structures of the national economy, as well as country-specific Solow parameters, where the spatial weighting functions and Solow parameters are allowed to be a function of geographical distance and the countries’ openness to trade, respectively.
Saad, Walid; Poor, H Vincent; Başar, Tamer; Song, Ju Bin
2012-01-01
This paper introduces a novel approach that enables a number of cognitive radio devices that are observing the availability pattern of a number of primary users(PUs), to cooperate and use \\emph{Bayesian nonparametric} techniques to estimate the distributions of the PUs' activity pattern, assumed to be completely unknown. In the proposed model, each cognitive node may have its own individual view on each PU's distribution, and, hence, seeks to find partners having a correlated perception. To address this problem, a coalitional game is formulated between the cognitive devices and an algorithm for cooperative coalition formation is proposed. It is shown that the proposed coalition formation algorithm allows the cognitive nodes that are experiencing a similar behavior from some PUs to self-organize into disjoint, independent coalitions. Inside each coalition, the cooperative cognitive nodes use a combination of Bayesian nonparametric models such as the Dirichlet process and statistical goodness of fit techniques ...
非参数判别模型%Nonparametric discriminant model
谢斌锋; 梁飞豹
2011-01-01
提出了一类新的判别分析方法,主要思想是将非参数回归模型推广到判别分析中,形成相应的非参数判别模型.通过实例与传统判别法相比较,表明非参数判别法具有更广泛的适用性和较高的回代正确率.%In this paper, the author puts forth a new class of discriminant method, which the main idea is applied non- parametric regression model to discriminant analysis and forms the corresponding nonparametric discriminant model. Compared with the traditional discriminant methods by citing an example, the nonparametric discriminant method has more comprehensive adaptability and higher correct rate of back subsitution.
Non-Parametric Tests of Structure for High Angular Resolution Diffusion Imaging in Q-Space
Olhede, Sofia C
2010-01-01
High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging. Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function. Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and also analytically tractable. The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density. Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed. More subtle characteristics of white-matter microstructure, such as the degre...
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems
Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2017-06-01
We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
Vapor Intrusion Facilities - South Bay
U.S. Environmental Protection Agency — POINT locations for the South Bay Vapor Instrusion Sites were derived from the NPL data for Region 9. One site, Philips Semiconductor, was extracted from the...
National Oceanic and Atmospheric Administration, Department of Commerce — Samples were collected from October 15, 1985 through June 12, 1987 in emergent marsh and non-vegetated habitats throughout the Lavaca Bay system to characterize...
Annual report, Bristol Bay, 1958
US Fish and Wildlife Service, Department of the Interior — Commercial fishery management activities for Bristol Bay for 1958, including lists of operators, extensive statistics, and descriptions of enforcement activities.
National Oceanic and Atmospheric Administration, Department of Commerce — Juvenile spotted seatrout and other sportfish are being monitored annually over a 6-mo period in Florida Bay to assess their abundance over time relative to...
CHWAKA BAY MANGROVE SEDIMENTS, ZANZIBAR
Mohammed-Studies on Benthic denitriﬁcation in the Chwaka bay mangrove. Extensive mangrove ... In this case, six sediment cores were taken randomly from the three study sites as above and a ..... Academic Press. Orlando. pp. 277-293.
Annual report, Bristol Bay, 1955
US Fish and Wildlife Service, Department of the Interior — Commercial fishery management activities for Bristol Bay for 1955, including lists of operators, extensive statistics, descriptions of enforcement activities, and...
Back Bay Wilderness area description
US Fish and Wildlife Service, Department of the Interior — This document is a description of the lands located within the Back Bay National Wildlife Refuge. Within these lands, it designates which area is suitable for...
Variable selection in identification of a high dimensional nonlinear non-parametric system
Er-Wei BAI; Wenxiao ZHAO; Weixing ZHENG
2015-01-01
The problem of variable selection in system identification of a high dimensional nonlinear non-parametric system is described. The inherent difficulty, the curse of dimensionality, is introduced. Then its connections to various topics and research areas are briefly discussed, including order determination, pattern recognition, data mining, machine learning, statistical regression and manifold embedding. Finally, some results of variable selection in system identification in the recent literature are presented.
Estimating Financial Risk Measures for Futures Positions:A Non-Parametric Approach
Cotter, John; dowd, kevin
2011-01-01
This paper presents non-parametric estimates of spectral risk measures applied to long and short positions in 5 prominent equity futures contracts. It also compares these to estimates of two popular alternative measures, the Value-at-Risk (VaR) and Expected Shortfall (ES). The spectral risk measures are conditioned on the coefficient of absolute risk aversion, and the latter two are conditioned on the confidence level. Our findings indicate that all risk measures increase dramatically and the...
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Using a nonparametric PV model to forecast AC power output of PV plants
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
In this paper, a methodology using a nonparametric model is used to forecast AC power output of PV plants using as inputs several forecasts of meteorological variables from a Numerical Weather Prediction (NWP) model and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quantile Regression Forests as machine learning tool to forecast the AC power with a confidence interval. Real data from five PV plants was used to validate the methodology, an...
An exact predictive recursion for Bayesian nonparametric analysis of incomplete data
Garibaldi, Ubaldo; Viarengo, Paolo
2010-01-01
This paper presents a new derivation of nonparametric distribution estimation with right-censored data. It is based on an extension of the predictive inferences to compound evidence. The estimate is recursive and exact, and no stochastic approximation is needed: it simply requires that the censored data are processed in decreasing order. Only in this case the recursion provides exact posterior predictive distributions for subsequent samples under a Dirichlet process prior. The resulting estim...
t-tests, non-parametric tests, and large studies—a paradox of statistical practice?
Fagerland Morten W
2012-06-01
Full Text Available Abstract Background During the last 30 years, the median sample size of research studies published in high-impact medical journals has increased manyfold, while the use of non-parametric tests has increased at the expense of t-tests. This paper explores this paradoxical practice and illustrates its consequences. Methods A simulation study is used to compare the rejection rates of the Wilcoxon-Mann-Whitney (WMW test and the two-sample t-test for increasing sample size. Samples are drawn from skewed distributions with equal means and medians but with a small difference in spread. A hypothetical case study is used for illustration and motivation. Results The WMW test produces, on average, smaller p-values than the t-test. This discrepancy increases with increasing sample size, skewness, and difference in spread. For heavily skewed data, the proportion of p Conclusions Non-parametric tests are most useful for small studies. Using non-parametric tests in large studies may provide answers to the wrong question, thus confusing readers. For studies with a large sample size, t-tests and their corresponding confidence intervals can and should be used even for heavily skewed data.
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Kianisarkaleh, Azadeh; Ghassemian, Hassan
2016-09-01
Feature extraction plays a crucial role in improvement of hyperspectral images classification. Nonparametric feature extraction methods show better performance compared to parametric ones when distribution of classes is non normal-like. Moreover, they can extract more features than parametric methods do. In this paper, a new nonparametric linear feature extraction method is introduced for classification of hyperspectral images. The proposed method has no free parameter and its novelty can be discussed in two parts. First, neighbor samples are specified by using Parzen window idea for determining local mean. Second, two new weighting functions are used. Samples close to class boundaries will have more weight in the between-class scatter matrix formation and samples close to class mean will have more weight in the within-class scatter matrix formation. The experimental results on three real hyperspectral data sets, Indian Pines, Salinas and Pavia University, demonstrate that the proposed method has better performance in comparison with some other nonparametric and parametric feature extraction methods.
A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale.
Mircioiu, Constantin; Atkinson, Jeffrey
2017-05-10
A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.
Non-parametric foreground subtraction for 21cm epoch of reionization experiments
Harker, Geraint; Bernardi, Gianni; Brentjens, Michiel A; De Bruyn, A G; Ciardi, Benedetta; Jelic, Vibor; Koopmans, Leon V E; Labropoulos, Panagiotis; Mellema, Garrelt; Offringa, Andre; Pandey, V N; Schaye, Joop; Thomas, Rajat M; Yatawatta, Sarod
2009-01-01
An obstacle to the detection of redshifted 21cm emission from the epoch of reionization (EoR) is the presence of foregrounds which exceed the cosmological signal in intensity by orders of magnitude. We argue that in principle it would be better to fit the foregrounds non-parametrically - allowing the data to determine their shape - rather than selecting some functional form in advance and then fitting its parameters. Non-parametric fits often suffer from other problems, however. We discuss these before suggesting a non-parametric method, Wp smoothing, which seems to avoid some of them. After outlining the principles of Wp smoothing we describe an algorithm used to implement it. We then apply Wp smoothing to a synthetic data cube for the LOFAR EoR experiment. The performance of Wp smoothing, measured by the extent to which it is able to recover the variance of the cosmological signal and to which it avoids leakage of power from the foregrounds, is compared to that of a parametric fit, and to another non-parame...
Dai, Wenlin
2017-09-01
Difference-based methods do not require estimating the mean function in nonparametric regression and are therefore popular in practice. In this paper, we propose a unified framework for variance estimation that combines the linear regression method with the higher-order difference estimators systematically. The unified framework has greatly enriched the existing literature on variance estimation that includes most existing estimators as special cases. More importantly, the unified framework has also provided a smart way to solve the challenging difference sequence selection problem that remains a long-standing controversial issue in nonparametric regression for several decades. Using both theory and simulations, we recommend to use the ordinary difference sequence in the unified framework, no matter if the sample size is small or if the signal-to-noise ratio is large. Finally, to cater for the demands of the application, we have developed a unified R package, named VarED, that integrates the existing difference-based estimators and the unified estimators in nonparametric regression and have made it freely available in the R statistical program http://cran.r-project.org/web/packages/.
Joaquín Texeira Quirós
2013-09-01
Full Text Available Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies which have not been certified, using a multivariate predictive model. With this approach, we assess which quality practices are associated to the likelihood of the firm being certified. Design/methodology/approach: We implemented nonparametric decision trees, in order to see which variables influence more the fact that the company be certified or not, i.e., the motivations that lead companies to make sure. Findings: The results show that only four questionnaire items are sufficient to predict if a firm is certified or not. It is shown that companies in which the respondent manifests greater concern with respect to customers relations; motivations of the employees and strategic planning have higher likelihood of being certified. Research implications: the reader should note that this study is based on data from a single country and, of course, these results capture many idiosyncrasies if its economic and corporate environment. It would be of interest to understand if this type of analysis reveals some regularities across different countries. Practical implications: companies should look for a set of practices congruent with total quality management and ISO 9000 certified. Originality/value: This study contributes to the literature on the internal motivation of companies to achieve certification under the ISO 9000 standard, by performing a comparative analysis of questionnaires answered by a sample of certified companies and a control sample of companies which have not been certified. In particular, we assess how the manager’s perception on the intensity in which quality practices are deployed in their firms is associated to the likelihood of the firm being certified.Purpose: This empirical study analyzes a questionnaire answered by a sample of ISO 9000 certified companies and a control sample of companies
Probably Almost Bayes Decisions
Anoulova, S.; Fischer, Paul; Poelt, S.
1996-01-01
discriminant functions for this purpose. We analyze this approach for different classes of distribution functions of Boolean features:kth order Bahadur-Lazarsfeld expansions andkth order Chow expansions. In both cases, we obtain upper bounds for the required sample size which are small polynomials...... in the relevant parameters and which match the lower bounds known for these classes. Moreover, the learning algorithms are efficient.......In this paper, we investigate the problem of classifying objects which are given by feature vectors with Boolean entries. Our aim is to "(efficiently) learn probably almost optimal classifications" from examples. A classical approach in pattern recognition uses empirical estimations of the Bayesian...
A watershed moment of the twentieth century, the end of empire saw upheavals to global power structures and national identities. However, decolonisation profoundly affected individual subjectivities too. Life Writing After Empire examines how people around the globe have made sense of the post......-imperial condition through the practice of life writing in its multifarious expressions, from auto/biography through travel writing to oral history and photography. Through interdisciplinary approaches that draw on literature and history alike, the contributors explore how we might approach these genres differently...... in order to understand how individual life writing reflects broader societal changes. From far-flung corners of the former British Empire, people have turned to life writing to manage painful or nostalgic memories, as well as to think about the past and future of the nation anew through the personal...
Empirical philosophy of science
2015-01-01
A growing number of philosophers of science make use of qualitative empirical data, a development that may reconfigure the relations between philosophy and sociology of science and that is reminiscent of efforts to integrate history and philosophy of science. Therefore, the first part...... of this introduction to the volume Empirical Philosophy of Science outlines the history of relations between philosophy and sociology of science on the one hand, and philosophy and history of science on the other. The second part of this introduction offers an overview of the papers in the volume, each of which...... is giving its own answer to questions such as: Why does the use of qualitative empirical methods benefit philosophical accounts of science? And how should these methods be used by the philosopher?...
An Empirical Kaiser Criterion.
Braeken, Johan; van Assen, Marcel A L M
2016-03-31
In exploratory factor analysis (EFA), most popular methods for dimensionality assessment such as the screeplot, the Kaiser criterion, or-the current gold standard-parallel analysis, are based on eigenvalues of the correlation matrix. To further understanding and development of factor retention methods, results on population and sample eigenvalue distributions are introduced based on random matrix theory and Monte Carlo simulations. These results are used to develop a new factor retention method, the Empirical Kaiser Criterion. The performance of the Empirical Kaiser Criterion and parallel analysis is examined in typical research settings, with multiple scales that are desired to be relatively short, but still reliable. Theoretical and simulation results illustrate that the new Empirical Kaiser Criterion performs as well as parallel analysis in typical research settings with uncorrelated scales, but much better when scales are both correlated and short. We conclude that the Empirical Kaiser Criterion is a powerful and promising factor retention method, because it is based on distribution theory of eigenvalues, shows good performance, is easily visualized and computed, and is useful for power analysis and sample size planning for EFA. (PsycINFO Database Record
Grund, Cynthia M.
musical performance and reception is inspired by traditional approaches within aesthetics, but it also challenges some of the presuppositions inherent in them. As an example of such work I present a research project in empirical music aesthetics begun last year and of which I am a team member....
Bang, Peter Fibiger
2007-01-01
This articles seeks to establish a new set of organizing concepts for the analysis of the Roman imperial economy from Republic to late antiquity: tributary empire, port-folio capitalism and protection costs. Together these concepts explain better economic developments in the Roman world than the...
Empirical Discovery in Linguistics
Pericliev, V
1995-01-01
A discovery system for detecting correspondences in data is described, based on the familiar induction methods of J. S. Mill. Given a set of observations, the system induces the ``causally'' related facts in these observations. Its application to empirical linguistic discovery is described.
Bai, Y.
2015-01-01
This dissertation consists of three essays on empirical banking. They study how do information and political activeness affect banks’ lending behavior, as well as the effect of lending relationship with banks on firms’ stock performance during interbank liquidity crunch. The first essay looks at a t
Bang, Peter Fibiger
2007-01-01
This articles seeks to establish a new set of organizing concepts for the analysis of the Roman imperial economy from Republic to late antiquity: tributary empire, port-folio capitalism and protection costs. Together these concepts explain better economic developments in the Roman world than the ...
Grund, Cynthia M.
musical performance and reception is inspired by traditional approaches within aesthetics, but it also challenges some of the presuppositions inherent in them. As an example of such work I present a research project in empirical music aesthetics begun last year and of which I am a team member....
Kurrild-Klitgaard, Peter
2014-01-01
applications. Special attention is given to three phenomena and their possible empirical manifestations: The instability of social choice in the form of (1) the possibility of majority cycles, (2) the non-robustness of social choices given alternative voting methods, and (3) the possibility of various forms...
Auditory Imagery: Empirical Findings
Hubbard, Timothy L.
2010-01-01
The empirical literature on auditory imagery is reviewed. Data on (a) imagery for auditory features (pitch, timbre, loudness), (b) imagery for complex nonverbal auditory stimuli (musical contour, melody, harmony, tempo, notational audiation, environmental sounds), (c) imagery for verbal stimuli (speech, text, in dreams, interior monologue), (d)…
75 FR 36292 - Safety Zone; Bay Swim III, Presque Isle Bay, Erie, PA
2010-06-25
... of Presque Isle Bay, Lake Erie, near Erie, Pennsylvania between 9 a.m. to 11 a.m. on June 26, 2010.... The safety zone will encompass specified waters of Presque Isle Bay, Erie, Pennsylvania starting at... SECURITY Coast Guard 33 CFR Part 165 RIN 1625-AA00 Safety Zone; Bay Swim III, Presque Isle Bay, Erie, PA...
77 FR 18739 - Safety Zone; Bay Swim V, Presque Isle Bay, Erie, PA
2012-03-28
... the January 17, 2008, issue of the Federal Register (73 FR 3316). Public Meeting We do not now plan to... SECURITY Coast Guard 33 CFR Part 165 RIN 1625-AA00 Safety Zone; Bay Swim V, Presque Isle Bay, Erie, PA... is intended to restrict vessels from a portion of the Presque Island Bay during the Bay Swim...
77 FR 35860 - Safety Zone; Bay Swim V, Presque Isle Bay, Erie, PA
2012-06-15
..., Erie, PA in the Federal Register (77 FR 18739). We received no letters commenting on the proposed rule... SECURITY Coast Guard 33 CFR Part 165 RIN 1625-AA00 Safety Zone; Bay Swim V, Presque Isle Bay, Erie, PA... restrict vessels from a portion of the Presque Island Bay during the Bay Swim V swimming event. The...
78 FR 34575 - Safety Zone; Bay Swim VI, Presque Isle Bay, Erie, PA
2013-06-10
... FR Federal Register NPRM Notice of Proposed Rulemaking TFR Temporary Final Rule A. Regulatory History... SECURITY Coast Guard 33 CFR Part 165 RIN 1625-AA00 Safety Zone; Bay Swim VI, Presque Isle Bay, Erie, PA... portion of Presque Isle bay during the Bay Swim VI swimming event. This temporary safety zone is...
Humboldt Bay, California Benthic Habitats 2009 Substrate
National Oceanic and Atmospheric Administration, Department of Commerce — Humboldt Bay is the largest estuary in California north of San Francisco Bay and represents a significant resource for the north coast region. Beginning in 2007 the...
Humboldt Bay, California Benthic Habitats 2009 Geoform
National Oceanic and Atmospheric Administration, Department of Commerce — Humboldt Bay is the largest estuary in California north of San Francisco Bay and represents a significant resource for the north coast region. Beginning in 2007 the...
Humboldt Bay, California Benthic Habitats 2009 Geodatabase
National Oceanic and Atmospheric Administration, Department of Commerce — Humboldt Bay is the largest estuary in California north of San Francisco Bay and represents a significant resource for the north coast region. Beginning in 2007 the...
Humboldt Bay, California Benthic Habitats 2009 Biotic
National Oceanic and Atmospheric Administration, Department of Commerce — Humboldt Bay is the largest estuary in California north of San Francisco Bay and represents a significant resource for the north coast region. Beginning in 2007 the...
SF Bay Water Quality Improvement Fund
EPAs grant program to protect and restore San Francisco Bay. The San Francisco Bay Water Quality Improvement Fund (SFBWQIF) has invested in 58 projects along with 70 partners contributing to restore wetlands, water quality, and reduce polluted runoff.,
Humboldt Bay Benthic Habitats 2009 Aquatic Setting
National Oceanic and Atmospheric Administration, Department of Commerce — Humboldt Bay is the largest estuary in California north of San Francisco Bay and represents a significant resource for the north coast region. Beginning in 2007 the...
An empirical study to determine the critical success factors of export industry
Masoud Babakhani
2011-01-01
Full Text Available Exporting goods and services play an important role on economy of developing countries. There are many countries in the world whose economy is solely based on exporting raw materials such as oil and gas. Many believe that countries cannot develop their economy as long as they rely on exporting one single group of raw materials. Therefore, there is a need to help other sectors of industries build good infrastructure for exporting diversified products. In this paper, we perform an empirical analysis to determine the critical success factors on exporting different goods. The results are analyzed using some statistical non-parametric methods and some useful guidelines are also suggested.
Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M
2014-01-01
This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Semi-empirical Likelihood Confidence Intervals for the Differences of Quantiles with Missing Data
Yong Song QIN; Jun Chao ZHANG
2009-01-01
Detecting population (group) differences is useful in many applications, such as medical research. In this paper, we explore the probabilistic theory for identifying the quantile differences between two populations. ,Suppose that there are two populations x and y with missing data on both of them, where x is nonparametric and y is parametric. We are interested in constructing confidence intervals on the quantile differences of x and y. Random hot deck imputation is used to fill in missing data. Semi-empirical likelihood confidence intervals on the differences are constructed.
Abobaker M. Jaber
2014-01-01
Full Text Available This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD with nonparametric methods of local linear quantile (LLQ. We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Bayes linear statistics, theory & methods
Goldstein, Michael
2007-01-01
Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...
33 CFR 165.1122 - San Diego Bay, Mission Bay and their Approaches-Regulated navigation area.
2010-07-01
... 33 Navigation and Navigable Waters 2 2010-07-01 2010-07-01 false San Diego Bay, Mission Bay and... Coast Guard District § 165.1122 San Diego Bay, Mission Bay and their Approaches—Regulated navigation... waters of San Diego Bay, Mission Bay, and their approaches encompassed by a line commencing at Point La...
Non-parametric change-point method for differential gene expression detection.
Yao Wang
Full Text Available BACKGROUND: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short, by using a single equation for detecting differential gene expression (DGE in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability. METHODOLOGY: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods. CONCLUSIONS: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.
Applications of Parametric and Nonparametric Tests for Event Studies on ISE
Handan YOLSAL
2011-01-01
In this study, we conducted a research as to whether splits in shares on the ISE-ON Index at the Istanbul Stock Exchange have had an impact on returns generated from shares between 2005 and 2011 or not using event study method. This study is based on parametric tests, as well as on nonparametric tests developed as an alternative to them. It has been observed that, when cross-sectional variance adjustment is applied to data set, such null hypothesis as “there is no average abnormal return at d...
Ohkubo, Jun
2011-12-01
A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic differential equation from time-series data. The scheme does not require to specify parametric forms for the drift and diffusion coefficients in advance. In order to perform the nonparametric estimation, a maximum likelihood method is combined with a concept based on a kernel density estimation. In order to deal with discrete observation or sparsity of the time-series data, a local linearization method is employed, which enables a fast estimation.
Blundell, Richard; Horowitz, Joel L.; Parey, Matthias
2011-01-01
This paper develops a new method for estimating a demand function and the welfare consequences of price changes. The method is applied to gasoline demand in the U.S. and is applicable to other goods. The method uses shape restrictions derived from economic theory to improve the precision of a nonparametric estimate of the demand function. Using data from the U.S. National Household Travel Survey, we show that the restrictions are consistent with the data on gasoline demand and remove the anom...
Two new non-parametric tests to the distance duality relation with galaxy clusters
Costa, S S; Holanda, R F L
2015-01-01
The cosmic distance duality relation is a milestone of cosmology involving the luminosity and angular diameter distances. Any departure of the relation points to new physics or systematic errors in the observations, therefore tests of the relation are extremely important to build a consistent cosmological framework. Here, two new tests are proposed based on galaxy clusters observations (angular diameter distance and gas mass fraction) and $H(z)$ measurements. By applying Gaussian Processes, a non-parametric method, we are able to derive constraints on departures of the relation where no evidence of deviation is found in both methods, reinforcing the cosmological and astrophysical hypotheses adopted so far.
Nonparametric variance estimation in the analysis of microarray data: a measurement error approach.
Carroll, Raymond J; Wang, Yuedong
2008-01-01
This article investigates the effects of measurement error on the estimation of nonparametric variance functions. We show that either ignoring measurement error or direct application of the simulation extrapolation, SIMEX, method leads to inconsistent estimators. Nevertheless, the direct SIMEX method can reduce bias relative to a naive estimator. We further propose a permutation SIMEX method which leads to consistent estimators in theory. The performance of both SIMEX methods depends on approximations to the exact extrapolants. Simulations show that both SIMEX methods perform better than ignoring measurement error. The methodology is illustrated using microarray data from colon cancer patients.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.
Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben
2017-06-06
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
Poage, J. L.
1975-01-01
A sequential nonparametric pattern classification procedure is presented. The method presented is an estimated version of the Wald sequential probability ratio test (SPRT). This method utilizes density function estimates, and the density estimate used is discussed, including a proof of convergence in probability of the estimate to the true density function. The classification procedure proposed makes use of the theory of order statistics, and estimates of the probabilities of misclassification are given. The procedure was tested on discriminating between two classes of Gaussian samples and on discriminating between two kinds of electroencephalogram (EEG) responses.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data
Tan, Qihua; Thomassen, Mads; Burton, Mark
2017-01-01
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...... time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health....
Nonparametric Bayesian Sparse Factor Models with application to Gene Expression modelling
Knowles, David
2010-01-01
A nonparametric Bayesian extension of Factor Analysis (FA) is proposed where observed data Y is modeled as a linear superposition, G, of a potentially infinite number of hidden factors, X. The Indian Buffet Process (IBP) is used as a prior on G to incorporate sparsity and to allow the number of latent features to be inferred. The model's utility for modeling gene expression data is investigated using randomly generated datasets based on a known sparse connectivity matrix for E. Coli, and on three biological datasets of increasing complexity.
Non-parametric trend analysis of water quality data of rivers in Kansas
Yu, Y.-S.; Zou, S.; Whittemore, D.
1993-01-01
Surface water quality data for 15 sampling stations in the Arkansas, Verdigris, Neosho, and Walnut river basins inside the state of Kansas were analyzed to detect trends (or lack of trends) in 17 major constituents by using four different non-parametric methods. The results show that concentrations of specific conductance, total dissolved solids, calcium, total hardness, sodium, potassium, alkalinity, sulfate, chloride, total phosphorus, ammonia plus organic nitrogen, and suspended sediment generally have downward trends. Some of the downward trends are related to increases in discharge, while others could be caused by decreases in pollution sources. Homogeneity tests show that both station-wide trends and basinwide trends are non-homogeneous. ?? 1993.
Noise and speckle reduction in synthetic aperture radar imagery by nonparametric Wiener filtering.
Caprari, R S; Goh, A S; Moffatt, E K
2000-12-10
We present a Wiener filter that is especially suitable for speckle and noise reduction in multilook synthetic aperture radar (SAR) imagery. The proposed filter is nonparametric, not being based on parametrized analytical models of signal statistics. Instead, the Wiener-Hopf equation is expressed entirely in terms of observed signal statistics, with no reference to the possibly unobservable pure signal and noise. This Wiener filter is simple in concept and implementation, exactly minimum mean-square error, and directly applicable to signal-dependent and multiplicative noise. We demonstrate the filtering of a genuine two-look SAR image and show how a nonnegatively constrained version of the filter substantially reduces ringing.
Rotondi, R.
2009-04-01
According to the unified scaling theory the probability distribution function of the recurrence time T is a scaled version of a base function and the average value of T can be used as a scale parameter for the distribution. The base function must belong to the scale family of distributions: tested on different catalogues and for different scale levels, for Corral (2005) the (truncated) generalized gamma distribution is the best model, for German (2006) the Weibull distribution. The scaling approach should overcome the difficulty of estimating distribution functions over small areas but theorical limitations and partial instability of the estimated distributions have been pointed out in the literature. Our aim is to analyze the recurrence time of strong earthquakes that occurred in the Italian territory. To satisfy the hypotheses of independence and identical distribution we have evaluated the times between events that occurred in each area of the Database of Individual Seismogenic Sources and then we have gathered them by eight tectonically coherent regions, each of them dominated by a well characterized geodynamic process. To solve problems like: paucity of data, presence of outliers and uncertainty in the choice of the functional expression for the distribution of t, we have followed a nonparametric approach (Rotondi (2009)) in which: (a) the maximum flexibility is obtained by assuming that the probability distribution is a random function belonging to a large function space, distributed as a stochastic process; (b) nonparametric estimation method is robust when the data contain outliers; (c) Bayesian methodology allows to exploit different information sources so that the model fitting may be good also to scarce samples. We have compared the hazard rates evaluated through the parametric and nonparametric approach. References Corral A. (2005). Mixing of rescaled data and Bayesian inference for earthquake recurrence times, Nonlin. Proces. Geophys., 12, 89
A NONPARAMETRIC PROCEDURE OF THE SAMPLE SIZE DETERMINATION FOR SURVIVAL RATE TEST
无
2000-01-01
Objective This paper proposes a nonparametric procedure of the sample size determination for survival rate test. Methods Using the classical asymptotic normal procedure yields the required homogenetic effective sample size and using the inverse operation with the prespecified value of the survival function of censoring times yields the required sample size. Results It is matched with the rate test for censored data, does not involve survival distributions, and reduces to its classical counterpart when there is no censoring. The observed power of the test coincides with the prescribed power under usual clinical conditions. Conclusion It can be used for planning survival studies of chronic diseases.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data
Tan, Qihua; Thomassen, Mads; Burton, Mark
2017-01-01
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
2010-04-01
OF EACH CELL ARE RESULTS OF KSVD AND BPFA, RESPECTIVELY. σ C.man House Peppers Lena Barbara Boats F.print Couple Hill 5 37.87 39.37 37.78 38.60 38.08...INTERPOLATION PSNR RESULTS, USING PATCH SIZE 8× 8. BOTTOM: BPFA RGB IMAGE INTERPOLATION PSNR RESULTS, USING PATCH SIZE 7× 7. data ratio C.man House Peppers Lena...of subspaces. IEEE Trans. Inform. Theory, 2009. [16] T. Ferguson . A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1:209–230
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
An adaptive nonparametric method in benchmark analysis for bioassay and environmental studies.
Bhattacharya, Rabi; Lin, Lizhen
2010-12-01
We present a novel nonparametric method for bioassay and benchmark analysis in risk assessment, which averages isotonic MLEs based on disjoint subgroups of dosages. The asymptotic theory for the methodology is derived, showing that the MISEs (mean integrated squared error) of the estimates of both the dose-response curve F and its inverse F(-1) achieve the optimal rate O(N(-4/5)). Also, we compute the asymptotic distribution of the estimate ζ~p of the effective dosage ζ(p) = F(-1) (p) which is shown to have an optimally small asymptotic variance.
2010-07-01
... 33 Navigation and Navigable Waters 1 2010-07-01 2010-07-01 false West Bay 117.622 Section 117.622 Navigation and Navigable Waters COAST GUARD, DEPARTMENT OF HOMELAND SECURITY BRIDGES DRAWBRIDGE OPERATION REGULATIONS Specific Requirements Massachusetts § 117.622 West Bay The draw of the West Bay Bridge, mile...
Bayes' postulate for trinomial trials
Diniz, M. A.; Polpo, A.
2012-10-01
In this paper, we discuss Bayes' postulate and its interpretation. We extend the binomial trial method proposed by de Finetti [1] to trinomial trials, for which we argue that the consideration of equiprobability a priori for the possible outcomes of the trinomial trials implies that the parameter vector has Dirichlet(1,1) as prior. Based on this result, we agree with Stigler [2] in that the notion in Bayes' postulate stating "absolutely know nothing" is related to the possible outcomes of an experiment and not to "non-information" about the parameter.
Vexler, Albert; Yu, Jihnhee
2011-07-01
In clinical trials examining the incidence of pneumonia it is a common practice to measure infection via both invasive and non-invasive procedures. In the context of a recently completed randomized trial comparing two treatments the invasive procedure was only utilized in certain scenarios due to the added risk involved, and given that the level of the non-invasive procedure surpassed a given threshold. Hence, what was observed was bivariate data with a pattern of missingness in the invasive variable dependent upon the value of the observed non-invasive observation within a given pair. In order to compare two treatments with bivariate observed data exhibiting this pattern of missingness we developed a semi-parametric methodology utilizing the density-based empirical likelihood approach in order to provide a non-parametric approximation to Neyman-Pearson-type test statistics. This novel empirical likelihood approach has both a parametric and non-parametric components. The non-parametric component utilizes the observations for the non-missing cases, while the parametric component is utilized to tackle the case where observations are missing with respect to the invasive variable. The method is illustrated through its application to the actual data obtained in the pneumonia study and is shown to be an efficient and practical method. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Empirical codon substitution matrix
Gonnet Gaston H
2005-06-01
Full Text Available Abstract Background Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution. Results A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons. Conclusion The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.
Backscatter imagery in Jobos Bay
National Oceanic and Atmospheric Administration, Department of Commerce — This image represents a 1x1 meter resolution backscatter mosaic of Jobos Bay, Puerto Rico (in NAD83 UTM 19 North). The backscatter values are in relative 8-bit (0 ...
A Critical Evaluation of the Nonparametric Approach to Estimate Terrestrial Evaporation
Yongmin Yang
2016-01-01
Full Text Available Evapotranspiration (ET estimation has been one of the most challenging problems in recent decades for hydrometeorologists. In this study, a nonparametric approach to estimate terrestrial evaporation was evaluated using both model simulation and measurements from three sites. Both the model simulation and the in situ evaluation at the Tiger Bush Site revealed that this approach would greatly overestimate ET under dry conditions (evaporative fraction smaller than 0.4. For the evaluation at the Tiger Bush Site, the difference between ET estimates and site observations could be as large as 130 W/m2. However, this approach provided good estimates over the two crop sites. The Nash-Sutcliffe coefficient (E was 0.9 and 0.94, respectively, for WC06 and Yingke. A further theoretical analysis indicates the nonparametric approach is very close to the equilibrium evaporation equation under wet conditions, and this can explain the good performance of this approach at the two crop sites in this study. The evaluation indicates that this approach needs more careful appraisal and that its application in dry conditions should be avoided.
Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting
Jelena Fiosina
2017-01-01
Full Text Available Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers in the distributed realization.
Nonparametric Identification of Glucose-Insulin Process in IDDM Patient with Multi-meal Disturbance
Bhattacharjee, A.; Sutradhar, A.
2012-12-01
Modern close loop control for blood glucose level in a diabetic patient necessarily uses an explicit model of the process. A fixed parameter full order or reduced order model does not characterize the inter-patient and intra-patient parameter variability. This paper deals with a frequency domain nonparametric identification of the nonlinear glucose-insulin process in an insulin dependent diabetes mellitus patient that captures the process dynamics in presence of uncertainties and parameter variations. An online frequency domain kernel estimation method has been proposed that uses the input-output data from the 19th order first principle model of the patient in intravenous route. Volterra equations up to second order kernels with extended input vector for a Hammerstein model are solved online by adaptive recursive least square (ARLS) algorithm. The frequency domain kernels are estimated using the harmonic excitation input data sequence from the virtual patient model. A short filter memory length of M = 2 was found sufficient to yield acceptable accuracy with lesser computation time. The nonparametric models are useful for closed loop control, where the frequency domain kernels can be directly used as the transfer function. The validation results show good fit both in frequency and time domain responses with nominal patient as well as with parameter variations.
A web application for evaluating Phase I methods using a non-parametric optimal benchmark.
Wages, Nolan A; Varhegyi, Nikole
2017-06-01
In evaluating the performance of Phase I dose-finding designs, simulation studies are typically conducted to assess how often a method correctly selects the true maximum tolerated dose under a set of assumed dose-toxicity curves. A necessary component of the evaluation process is to have some concept for how well a design can possibly perform. The notion of an upper bound on the accuracy of maximum tolerated dose selection is often omitted from the simulation study, and the aim of this work is to provide researchers with accessible software to quickly evaluate the operating characteristics of Phase I methods using a benchmark. The non-parametric optimal benchmark is a useful theoretical tool for simulations that can serve as an upper limit for the accuracy of maximum tolerated dose identification based on a binary toxicity endpoint. It offers researchers a sense of the plausibility of a Phase I method's operating characteristics in simulation. We have developed an R shiny web application for simulating the benchmark. The web application has the ability to quickly provide simulation results for the benchmark and requires no programming knowledge. The application is free to access and use on any device with an Internet browser. The application provides the percentage of correct selection of the maximum tolerated dose and an accuracy index, operating characteristics typically used in evaluating the accuracy of dose-finding designs. We hope this software will facilitate the use of the non-parametric optimal benchmark as an evaluation tool in dose-finding simulation.
Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution
He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun
2016-05-01
Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.
LSTA, Rawane Samb
2010-01-01
This thesis deals with the nonparametric estimation of density f of the regression error term E of the model Y=m(X)+E, assuming its independence with the covariate X. The difficulty linked to this study is the fact that the regression error E is not observed. In a such setup, it would be unwise, for estimating f, to use a conditional approach based upon the probability distribution function of Y given X. Indeed, this approach is affected by the curse of dimensionality, so that the resulting estimator of the residual term E would have considerably a slow rate of convergence if the dimension of X is very high. Two approaches are proposed in this thesis to avoid the curse of dimensionality. The first approach uses the estimated residuals, while the second integrates a nonparametric conditional density estimator of Y given X. If proceeding so can circumvent the curse of dimensionality, a challenging issue is to evaluate the impact of the estimated residuals on the final estimator of the density f. We will also at...
Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model
Yujuan Sun
2014-01-01
Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.
Akhmadaliev, S Z; Ambrosini, G; Amorim, A; Anderson, K; Andrieux, M L; Aubert, Bernard; Augé, E; Badaud, F; Baisin, L; Barreiro, F; Battistoni, G; Bazan, A; Bazizi, K; Belymam, A; Benchekroun, D; Berglund, S R; Berset, J C; Blanchot, G; Bogush, A A; Bohm, C; Boldea, V; Bonivento, W; Bosman, M; Bouhemaid, N; Breton, D; Brette, P; Bromberg, C; Budagov, Yu A; Burdin, S V; Calôba, L P; Camarena, F; Camin, D V; Canton, B; Caprini, M; Carvalho, J; Casado, M P; Castillo, M V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Chadelas, R; Chalifour, M; Chekhtman, A; Chevalley, J L; Chirikov-Zorin, I E; Chlachidze, G; Citterio, M; Cleland, W E; Clément, C; Cobal, M; Cogswell, F; Colas, Jacques; Collot, J; Cologna, S; Constantinescu, S; Costa, G; Costanzo, D; Crouau, M; Daudon, F; David, J; David, M; Davidek, T; Dawson, J; De, K; de La Taille, C; Del Peso, J; Del Prete, T; de Saintignon, P; Di Girolamo, B; Dinkespiler, B; Dita, S; Dodd, J; Dolejsi, J; Dolezal, Z; Downing, R; Dugne, J J; Dzahini, D; Efthymiopoulos, I; Errede, D; Errede, S; Evans, H; Eynard, G; Fassi, F; Fassnacht, P; Ferrari, A; Ferrer, A; Flaminio, Vincenzo; Fournier, D; Fumagalli, G; Gallas, E; Gaspar, M; Giakoumopoulou, V; Gianotti, F; Gildemeister, O; Giokaris, N; Glagolev, V; Glebov, V Yu; Gomes, A; González, V; González de la Hoz, S; Grabskii, V; Graugès-Pous, E; Grenier, P; Hakopian, H H; Haney, M; Hébrard, C; Henriques, A; Hervás, L; Higón, E; Holmgren, Sven Olof; Hostachy, J Y; Hoummada, A; Huston, J; Imbault, D; Ivanyushenkov, Yu M; Jézéquel, S; Johansson, E K; Jon-And, K; Jones, R; Juste, A; Kakurin, S; Karyukhin, A N; Khokhlov, Yu A; Khubua, J I; Klioukhine, V I; Kolachev, G M; Kopikov, S V; Kostrikov, M E; Kozlov, V; Krivkova, P; Kukhtin, V V; Kulagin, M; Kulchitskii, Yu A; Kuzmin, M V; Labarga, L; Laborie, G; Lacour, D; Laforge, B; Lami, S; Lapin, V; Le Dortz, O; Lefebvre, M; Le Flour, T; Leitner, R; Leltchouk, M; Li, J; Liablin, M V; Linossier, O; Lissauer, D; Lobkowicz, F; Lokajícek, M; Lomakin, Yu F; López-Amengual, J M; Lund-Jensen, B; Maio, A; Makowiecki, D S; Malyukov, S N; Mandelli, L; Mansoulié, B; Mapelli, Livio P; Marin, C P; Marrocchesi, P S; Marroquim, F; Martin, P; Maslennikov, A L; Massol, N; Mataix, L; Mazzanti, M; Mazzoni, E; Merritt, F S; Michel, B; Miller, R; Minashvili, I A; Miralles, L; Mnatzakanian, E A; Monnier, E; Montarou, G; Mornacchi, Giuseppe; Moynot, M; Muanza, G S; Nayman, P; Némécek, S; Nessi, Marzio; Nicoleau, S; Niculescu, M; Noppe, J M; Onofre, A; Pallin, D; Pantea, D; Paoletti, R; Park, I C; Parrour, G; Parsons, J; Pereira, A; Perini, L; Perlas, J A; Perrodo, P; Pilcher, J E; Pinhão, J; Plothow-Besch, Hartmute; Poggioli, Luc; Poirot, S; Price, L; Protopopov, Yu; Proudfoot, J; Puzo, P; Radeka, V; Rahm, David Charles; Reinmuth, G; Renzoni, G; Rescia, S; Resconi, S; Richards, R; Richer, J P; Roda, C; Rodier, S; Roldán, J; Romance, J B; Romanov, V; Romero, P; Rossel, F; Rusakovitch, N A; Sala, P; Sanchis, E; Sanders, H; Santoni, C; Santos, J; Sauvage, D; Sauvage, G; Sawyer, L; Says, L P; Schaffer, A C; Schwemling, P; Schwindling, J; Seguin-Moreau, N; Seidl, W; Seixas, J M; Selldén, B; Seman, M; Semenov, A; Serin, L; Shaldaev, E; Shochet, M J; Sidorov, V; Silva, J; Simaitis, V J; Simion, S; Sissakian, A N; Snopkov, R; Söderqvist, J; Solodkov, A A; Soloviev, A; Soloviev, I V; Sonderegger, P; Soustruznik, K; Spanó, F; Spiwoks, R; Stanek, R; Starchenko, E A; Stavina, P; Stephens, R; Suk, M; Surkov, A; Sykora, I; Takai, H; Tang, F; Tardell, S; Tartarelli, F; Tas, P; Teiger, J; Thaler, J; Thion, J; Tikhonov, Yu A; Tisserant, S; Tokar, S; Topilin, N D; Trka, Z; Turcotte, M; Valkár, S; Varanda, M J; Vartapetian, A H; Vazeille, F; Vichou, I; Vinogradov, V; Vorozhtsov, S B; Vuillemin, V; White, A; Wielers, M; Wingerter-Seez, I; Wolters, H; Yamdagni, N; Yosef, C; Zaitsev, A; Zitoun, R; Zolnierowski, Y
2002-01-01
This paper discusses hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter (consisting of a lead-liquid argon electromagnetic part and an iron-scintillator hadronic part) in the framework of the nonparametrical method. The nonparametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to an easy use in a first level trigger. The reconstructed mean values of the hadron energies are within +or-1% of the true values and the fractional energy resolution is [(58+or-3)%/ square root E+(2.5+or-0.3)%](+)(1.7+or-0.2)/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74+or-0.04 and agrees with the prediction that e/h >1.66 for this electromagnetic calorimeter. Results of a study of the longitudinal hadronic shower development are also presented. The data have been taken in the H8 beam...
Non-parametric transformation for data correlation and integration: From theory to practice
Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)
1997-08-01
The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.
Urbi Garay
2016-03-01
Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.
Identification and well-posedness in a class of nonparametric problems
Zinde-Walsh, Victoria
2010-01-01
This is a companion note to Zinde-Walsh (2010), arXiv:1009.4217v1[MATH.ST], to clarify and extend results on identification in a number of problems that lead to a system of convolution equations. Examples include identification of the distribution of mismeasured variables, of a nonparametric regression function under Berkson type measurement error, some nonparametric panel data models, etc. The reason that identification in different problems can be considered in one approach is that they lead to the same system of convolution equations; moreover the solution can be given under more general assumptions than those usually considered, by examining these equations in spaces of generalized functions. An important issue that did not receive sufficient attention is that of well-posedness. This note gives conditions under which well-posedness obtains, an example that demonstrates that when well-posedness does not hold functions that are far apart can give rise to observable arbitrarily close functions and discusses ...
Comparison of Parametric and Nonparametric Methods for Analyzing the Bias of a Numerical Model
Isaac Mugume
2016-01-01
Full Text Available Numerical models are presently applied in many fields for simulation and prediction, operation, or research. The output from these models normally has both systematic and random errors. The study compared January 2015 temperature data for Uganda as simulated using the Weather Research and Forecast model with actual observed station temperature data to analyze the bias using parametric (the root mean square error (RMSE, the mean absolute error (MAE, mean error (ME, skewness, and the bias easy estimate (BES and nonparametric (the sign test, STM methods. The RMSE normally overestimates the error compared to MAE. The RMSE and MAE are not sensitive to direction of bias. The ME gives both direction and magnitude of bias but can be distorted by extreme values while the BES is insensitive to extreme values. The STM is robust for giving the direction of bias; it is not sensitive to extreme values but it does not give the magnitude of bias. The graphical tools (such as time series and cumulative curves show the performance of the model with time. It is recommended to integrate parametric and nonparametric methods along with graphical methods for a comprehensive analysis of bias of a numerical model.
Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S
2005-10-01
In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Robust non-parametric one-sample tests for the analysis of recurrent events.
Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia
2010-12-30
One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population.
Palacios, Julia A; Minin, Vladimir N
2013-03-01
Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.
A non-parametric approach to estimate the total deviation index for non-normal data.
Perez-Jaume, Sara; Carrasco, Josep L
2015-11-10
Concordance indices are used to assess the degree of agreement between different methods that measure the same characteristic. In this context, the total deviation index (TDI) is an unscaled concordance measure that quantifies to which extent the readings from the same subject obtained by different methods may differ with a certain probability. Common approaches to estimate the TDI assume data are normally distributed and linearity between response and effects (subjects, methods and random error). Here, we introduce a new non-parametric methodology for estimation and inference of the TDI that can deal with any kind of quantitative data. The present study introduces this non-parametric approach and compares it with the already established methods in two real case examples that represent situations of non-normal data (more specifically, skewed data and count data). The performance of the already established methodologies and our approach in these contexts is assessed by means of a simulation study. Copyright © 2015 John Wiley & Sons, Ltd.
A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.
Shalchyan, Vahid; Farina, Dario
2014-02-15
Neural spikes from multiple neurons recorded in a multi-unit signal are usually separated by clustering. Drifts in the position of the recording electrode relative to the neurons over time cause gradual changes in the position and shapes of the clusters, challenging the clustering task. By dividing the data into short time intervals, Bayesian tracking of the clusters based on Gaussian cluster model has been previously proposed. However, the Gaussian cluster model is often not verified for neural spikes. We present a Bayesian clustering approach that makes no assumptions on the distribution of the clusters and use kernel-based density estimation of the clusters in every time interval as a prior for Bayesian classification of the data in the subsequent time interval. The proposed method was tested and compared to Gaussian model-based approach for cluster tracking by using both simulated and experimental datasets. The results showed that the proposed non-parametric kernel-based density estimation of the clusters outperformed the sequential Gaussian model fitting in both simulated and experimental data tests. Using non-parametric kernel density-based clustering that makes no assumptions on the distribution of the clusters enhances the ability of tracking cluster non-stationarity over time with respect to the Gaussian cluster modeling approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Non-parametric iterative model constraint graph min-cut for automatic kidney segmentation.
Freiman, M; Kronman, A; Esses, S J; Joskowicz, L; Sosna, J
2010-01-01
We present a new non-parametric model constraint graph min-cut algorithm for automatic kidney segmentation in CT images. The segmentation is formulated as a maximum a-posteriori estimation of a model-driven Markov random field. A non-parametric hybrid shape and intensity model is treated as a latent variable in the energy functional. The latent model and labeling map that minimize the energy functional are then simultaneously computed with an expectation maximization approach. The main advantages of our method are that it does not assume a fixed parametric prior model, which is subjective to inter-patient variability and registration errors, and that it combines both the model and the image information into a unified graph min-cut based segmentation framework. We evaluated our method on 20 kidneys from 10 CT datasets with and without contrast agent for which ground-truth segmentations were generated by averaging three manual segmentations. Our method yields an average volumetric overlap error of 10.95%, and average symmetric surface distance of 0.79 mm. These results indicate that our method is accurate and robust for kidney segmentation.
Montiel, Ariadna; Sendra, Irene; Escamilla-Rivera, Celia; Salzano, Vincenzo
2014-01-01
In this work we present a nonparametric approach, which works on minimal assumptions, to reconstruct the cosmic expansion of the Universe. We propose to combine a locally weighted scatterplot smoothing method and a simulation-extrapolation method. The first one (Loess) is a nonparametric approach that allows to obtain smoothed curves with no prior knowledge of the functional relationship between variables nor of the cosmological quantities. The second one (Simex) takes into account the effect of measurement errors on a variable via a simulation process. For the reconstructions we use as raw data the Union2.1 Type Ia Supernovae compilation, as well as recent Hubble parameter measurements. This work aims to illustrate the approach, which turns out to be a self-sufficient technique in the sense we do not have to choose anything by hand. We examine the details of the method, among them the amount of observational data needed to perform the locally weighted fit which will define the robustness of our reconstructio...
MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO
Jardel, John R.; Gebhardt, Karl [Department of Astronomy, The University of Texas, 2515 Speedway, Stop C1400, Austin, TX 78712-1205 (United States); Fabricius, Maximilian H.; Williams, Michael J. [Max-Planck Institut fuer extraterrestrische Physik, Giessenbachstrasse, D-85741 Garching bei Muenchen (Germany); Drory, Niv, E-mail: jardel@astro.as.utexas.edu [Instituto de Astronomia, Universidad Nacional Autonoma de Mexico, Avenida Universidad 3000, Ciudad Universitaria, C.P. 04510 Mexico D.F. (Mexico)
2013-02-15
We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Our non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.
Evolution of the CMB Power Spectrum Across WMAP Data Releases: A Nonparametric Analysis
Aghamousa, Amir; Souradeep, Tarun
2011-01-01
We present a comparative analysis of the WMAP 1-, 3-, 5-, and 7-year data releases for the CMB angular power spectrum, with respect to the following three key questions: (a) How well is the angular power spectrum determined by the data alone? (b) How well is the Lambda-CDM model supported by a model-independent, data-driven analysis? (c) What are the realistic uncertainties on peak/dip locations and heights? Our analysis is based on a nonparametric function estimation methodology [1,2]. Our results show that the height of the power spectrum is well determined by data alone for multipole index l approximately less than 600 (1-year), 800 (3-year), and 900 (5- and 7-year data realizations). We also show that parametric fits based on the Lambda-CDM model are remarkably close to our nonparametric fit in l-regions where the data are sufficiently precise. A contrasting example is provided by an H-Lambda-CDM model: As the data become precise with successive data realizations, the H-Lambda-CDM angular power spectrum g...
A Bayesian non-parametric Potts model with application to pre-surgical FMRI data.
Johnson, Timothy D; Liu, Zhuqing; Bartsch, Andreas J; Nichols, Thomas E
2013-08-01
The Potts model has enjoyed much success as a prior model for image segmentation. Given the individual classes in the model, the data are typically modeled as Gaussian random variates or as random variates from some other parametric distribution. In this article, we present a non-parametric Potts model and apply it to a functional magnetic resonance imaging study for the pre-surgical assessment of peritumoral brain activation. In our model, we assume that the Z-score image from a patient can be segmented into activated, deactivated, and null classes, or states. Conditional on the class, or state, the Z-scores are assumed to come from some generic distribution which we model non-parametrically using a mixture of Dirichlet process priors within the Bayesian framework. The posterior distribution of the model parameters is estimated with a Markov chain Monte Carlo algorithm, and Bayesian decision theory is used to make the final classifications. Our Potts prior model includes two parameters, the standard spatial regularization parameter and a parameter that can be interpreted as the a priori probability that each voxel belongs to the null, or background state, conditional on the lack of spatial regularization. We assume that both of these parameters are unknown, and jointly estimate them along with other model parameters. We show through simulation studies that our model performs on par, in terms of posterior expected loss, with parametric Potts models when the parametric model is correctly specified and outperforms parametric models when the parametric model in misspecified.
Macmillan, N A; Creelman, C D
1996-06-01
Can accuracy and response bias in two-stimulus, two-response recognition or detection experiments be measured nonparametrically? Pollack and Norman (1964) answered this question affirmatively for sensitivity, Hodos (1970) for bias: Both proposed measures based on triangular areas in receiver-operating characteristic space. Their papers, and especially a paper by Grier (1971) that provided computing formulas for the measures, continue to be heavily cited in a wide range of content areas. In our sample of articles, most authors described triangle-based measures as making fewer assumptions than measures associated with detection theory. However, we show that statistics based on products or ratios of right triangle areas, including a recently proposed bias index and a not-yetproposed but apparently plausible sensitivity index, are consistent with a decision process based on logistic distributions. Even the Pollack and Norman measure, which is based on non-right triangles, is approximately logistic for low values of sensitivity. Simple geometric models for sensitivity and bias are not nonparametric, even if their implications are not acknowledged in the defining publications.
Yong Liu,; Wei Zou
2011-01-01
Extending the income dynamics approach in Quah （2003）, the present paper studies the enlarging income inequality in China over the past three decades from the viewpoint of rural-urban migration and economic transition. We establish non-parametric estimations of rural and urban income distribution functions in China, and aggregate a population- weighted, nationwide income distribution function taking into account rural-urban differences in technological progress and price indexes. We calculate 12 inequality indexes through non-parametric estimation to overcome the biases in existingparametric estimation and, therefore, provide more accurate measurement of income inequalitY. Policy implications have been drawn based on our research.
Constructive Verification, Empirical Induction, and Falibilist Deduction: A Threefold Contrast
Julio Michael Stern
2011-10-01
Full Text Available This article explores some open questions related to the problem of verification of theories in the context of empirical sciences by contrasting three epistemological frameworks. Each of these epistemological frameworks is based on a corresponding central metaphor, namely: (a Neo-empiricism and the gambling metaphor; (b Popperian falsificationism and the scientific tribunal metaphor; (c Cognitive constructivism and the object as eigen-solution metaphor. Each of one of these epistemological frameworks has also historically co-evolved with a certain statistical theory and method for testing scientific hypotheses, respectively: (a Decision theoretic Bayesian statistics and Bayes factors; (b Frequentist statistics and p-values; (c Constructive Bayesian statistics and e-values. This article examines with special care the Zero Probability Paradox (ZPP, related to the verification of sharp or precise hypotheses. Finally, this article makes some remarks on Lakatos’ view of mathematics as a quasi-empirical science.
Jin, Qian; He, Li-Jun; Zhang, Ai-Bing
2012-01-01
In the recent worldwide campaign for the global biodiversity inventory via DNA barcoding, a simple and easily used measure of confidence for assigning sequences to species in DNA barcoding has not been established so far, although the likelihood ratio test and the bayesian approach had been proposed to address this issue from a statistical point of view. The TDR (Two Dimensional non-parametric Resampling) measure newly proposed in this study offers users a simple and easy approach to evaluate the confidence of species membership in DNA barcoding projects. We assessed the validity and robustness of the TDR approach using datasets simulated under coalescent models, and an empirical dataset, and found that TDR measure is very robust in assessing species membership of DNA barcoding. In contrast to the likelihood ratio test and bayesian approach, the TDR method stands out due to simplicity in both concepts and calculations, with little in the way of restrictive population genetic assumptions. To implement this approach we have developed a computer program package (TDR1.0beta) freely available from ftp://202.204.209.200/education/video/TDR1.0beta.rar.
Jongjoo, Kim; Davis, Scott K; Taylor, Jeremy F
2002-06-01
Empirical confidence intervals (CIs) for the estimated quantitative trait locus (QTL) location from selective and non-selective non-parametric bootstrap resampling methods were compared for a genome scan involving an Angus x Brahman reciprocal fullsib backcross population. Genetic maps, based on 357 microsatellite markers, were constructed for 29 chromosomes using CRI-MAP V2.4. Twelve growth, carcass composition and beef quality traits (n = 527-602) were analysed to detect QTLs utilizing (composite) interval mapping approaches. CIs were investigated for 28 likelihood ratio test statistic (LRT) profiles for the one QTL per chromosome model. The CIs from the non-selective bootstrap method were largest (87 7 cM average or 79-2% coverage of test chromosomes). The Selective II procedure produced the smallest CI size (42.3 cM average). However, CI sizes from the Selective II procedure were more variable than those produced by the two LOD drop method. CI ranges from the Selective II procedure were also asymmetrical (relative to the most likely QTL position) due to the bias caused by the tendency for the estimated QTL position to be at a marker position in the bootstrap samples and due to monotonicity and asymmetry of the LRT curve in the original sample.