Bayesian Nonparametric Clustering for Positive Definite Matrices.
Cherian, Anoop; Morellas, Vassilios; Papanikolopoulos, Nikolaos
2016-05-01
Symmetric Positive Definite (SPD) matrices emerge as data descriptors in several applications of computer vision such as object tracking, texture recognition, and diffusion tensor imaging. Clustering these data matrices forms an integral part of these applications, for which soft-clustering algorithms (K-Means, expectation maximization, etc.) are generally used. As is well-known, these algorithms need the number of clusters to be specified, which is difficult when the dataset scales. To address this issue, we resort to the classical nonparametric Bayesian framework by modeling the data as a mixture model using the Dirichlet process (DP) prior. Since these matrices do not conform to the Euclidean geometry, rather belongs to a curved Riemannian manifold,existing DP models cannot be directly applied. Thus, in this paper, we propose a novel DP mixture model framework for SPD matrices. Using the log-determinant divergence as the underlying dissimilarity measure to compare these matrices, and further using the connection between this measure and the Wishart distribution, we derive a novel DPM model based on the Wishart-Inverse-Wishart conjugate pair. We apply this model to several applications in computer vision. Our experiments demonstrate that our model is scalable to the dataset size and at the same time achieves superior accuracy compared to several state-of-the-art parametric and nonparametric clustering algorithms.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
Non-parametric Reconstruction of Cluster Mass Distribution from Strong Lensing Modelling Abell 370
Abdel-Salam, H M; Williams, L L R
1997-01-01
We describe a new non-parametric technique for reconstructing the mass distribution in galaxy clusters with strong lensing, i.e., from multiple images of background galaxies. The observed positions and redshifts of the images are considered as rigid constraints and through the lens (ray-trace) equation they provide us with linear constraint equations. These constraints confine the mass distribution to some allowed region, which is then found by linear programming. Within this allowed region we study in detail the mass distribution with minimum mass-to-light variation; also some others, such as the smoothest mass distribution. The method is applied to the extensively studied cluster Abell 370, which hosts a giant luminous arc and several other multiply imaged background galaxies. Our mass maps are constrained by the observed positions and redshifts (spectroscopic or model-inferred by previous authors) of the giant arc and multiple image systems. The reconstructed maps obtained for A370 reveal a detailed mass d...
Revealing components of the galaxy population through nonparametric techniques
Bamford, Steven P; Nichol, Robert C; Miller, Christopher J; Wasserman, Larry; Genovese, Christopher R; Freeman, Peter E
2008-01-01
The distributions of galaxy properties vary with environment, and are often multimodal, suggesting that the galaxy population may be a combination of multiple components. The behaviour of these components versus environment holds details about the processes of galaxy development. To release this information we apply a novel, nonparametric statistical technique, identifying four components present in the distribution of galaxy H$\\alpha$ emission-line equivalent-widths. We interpret these components as passive, star-forming, and two varieties of active galactic nuclei. Independent of this interpretation, the properties of each component are remarkably constant as a function of environment. Only their relative proportions display substantial variation. The galaxy population thus appears to comprise distinct components which are individually independent of environment, with galaxies rapidly transitioning between components as they move into denser environments.
A non-parametric Bayesian approach for clustering and tracking non-stationarities of neural spikes.
Shalchyan, Vahid; Farina, Dario
2014-02-15
Neural spikes from multiple neurons recorded in a multi-unit signal are usually separated by clustering. Drifts in the position of the recording electrode relative to the neurons over time cause gradual changes in the position and shapes of the clusters, challenging the clustering task. By dividing the data into short time intervals, Bayesian tracking of the clusters based on Gaussian cluster model has been previously proposed. However, the Gaussian cluster model is often not verified for neural spikes. We present a Bayesian clustering approach that makes no assumptions on the distribution of the clusters and use kernel-based density estimation of the clusters in every time interval as a prior for Bayesian classification of the data in the subsequent time interval. The proposed method was tested and compared to Gaussian model-based approach for cluster tracking by using both simulated and experimental datasets. The results showed that the proposed non-parametric kernel-based density estimation of the clusters outperformed the sequential Gaussian model fitting in both simulated and experimental data tests. Using non-parametric kernel density-based clustering that makes no assumptions on the distribution of the clusters enhances the ability of tracking cluster non-stationarity over time with respect to the Gaussian cluster modeling approach. Copyright © 2013 Elsevier B.V. All rights reserved.
Clustering Techniques in Bioinformatics
Directory of Open Access Journals (Sweden)
Muhammad Ali Masood
2015-01-01
Full Text Available Dealing with data means to group information into a set of categories either in order to learn new artifacts or understand new domains. For this purpose researchers have always looked for the hidden patterns in data that can be defined and compared with other known notions based on the similarity or dissimilarity of their attributes according to well-defined rules. Data mining, having the tools of data classification and data clustering, is one of the most powerful techniques to deal with data in such a manner that it can help researchers identify the required information. As a step forward to address this challenge, experts have utilized clustering techniques as a mean of exploring hidden structure and patterns in underlying data. Improved stability, robustness and accuracy of unsupervised data classification in many fields including pattern recognition, machine learning, information retrieval, image analysis and bioinformatics, clustering has proven itself as a reliable tool. To identify the clusters in datasets algorithm are utilized to partition data set into several groups based on the similarity within a group. There is no specific clustering algorithm, but various algorithms are utilized based on domain of data that constitutes a cluster and the level of efficiency required. Clustering techniques are categorized based upon different approaches. This paper is a survey of few clustering techniques out of many in data mining. For the purpose five of the most common clustering techniques out of many have been discussed. The clustering techniques which have been surveyed are: K-medoids, K-means, Fuzzy C-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN and Self-Organizing Map (SOM clustering.
Non-parametric co-clustering of large scale sparse bipartite networks on the GPU
DEFF Research Database (Denmark)
Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai
2011-01-01
Co-clustering is a problem of both theoretical and practical importance, e.g., market basket analysis and collaborative filtering, and in web scale text processing. We state the co-clustering problem in terms of non-parametric generative models which can address the issue of estimating the number...... of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale......-life large scale collaborative filtering data and web scale text corpora, demonstrating that latent mesoscale structures extracted by the co-clustering problem as formulated by the Infinite Relational Model (IRM) are consistent across consecutive runs with different initializations and also relevant...
Xu, Zhiqiang
2017-02-16
Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.
Clustering Techniques in Bioinformatics
National Research Council Canada - National Science Library
Muhammad Ali Masood; M. N. A. Khan
2015-01-01
... according to well-defined rules. Data mining, having the tools of data classification and data clustering, is one of the most powerful techniques to deal with data in such a manner that it can help researchers identify the required information...
An Automatic Clustering Technique for Optimal Clusters
Pavan, K Karteeka; Rao, A V Dattatreya; 10.5121/ijcsea.2011.1412
2011-01-01
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
Two new non-parametric tests to the distance duality relation with galaxy clusters
Costa, S S; Holanda, R F L
2015-01-01
The cosmic distance duality relation is a milestone of cosmology involving the luminosity and angular diameter distances. Any departure of the relation points to new physics or systematic errors in the observations, therefore tests of the relation are extremely important to build a consistent cosmological framework. Here, two new tests are proposed based on galaxy clusters observations (angular diameter distance and gas mass fraction) and $H(z)$ measurements. By applying Gaussian Processes, a non-parametric method, we are able to derive constraints on departures of the relation where no evidence of deviation is found in both methods, reinforcing the cosmological and astrophysical hypotheses adopted so far.
Measuring the Influence of Networks on Transaction Costs Using a Nonparametric Regression Technique
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
. We empirically analyse the effect of networks on productivity using a cross-validated local linear non-parametric regression technique and a data set of 384 farms in Poland. Our empirical study generally supports our hypothesis that networks affect productivity. Large and dense trading networks...
Non-parametric method for measuring gas inhomogeneities from X-ray observations of galaxy clusters
Morandi, Andrea; Cui, Wei
2013-01-01
We present a non-parametric method to measure inhomogeneities in the intracluster medium (ICM) from X-ray observations of galaxy clusters. Analyzing mock Chandra X-ray observations of simulated clusters, we show that our new method enables the accurate recovery of the 3D gas density and gas clumping factor profiles out to large radii of galaxy clusters. We then apply this method to Chandra X-ray observations of Abell 1835 and present the first determination of the gas clumping factor from the X-ray cluster data. We find that the gas clumping factor in Abell 1835 increases with radius and reaches ~2-3 at r=R_{200}. This is in good agreement with the predictions of hydrodynamical simulations, but it is significantly below the values inferred from recent Suzaku observations. We further show that the radially increasing gas clumping factor causes flattening of the derived entropy profile of the ICM and affects physical interpretation of the cluster gas structure, especially at the large cluster-centric radii. Our...
Directory of Open Access Journals (Sweden)
Urbi Garay
2016-03-01
Full Text Available We define a dynamic and self-adjusting mixture of Gaussian Graphical Models to cluster financial returns, and provide a new method for extraction of nonparametric estimates of dynamic alphas (excess return and betas (to a choice set of explanatory factors in a multivariate setting. This approach, as well as the outputs, has a dynamic, nonstationary and nonparametric form, which circumvents the problem of model risk and parametric assumptions that the Kalman filter and other widely used approaches rely on. The by-product of clusters, used for shrinkage and information borrowing, can be of use to determine relationships around specific events. This approach exhibits a smaller Root Mean Squared Error than traditionally used benchmarks in financial settings, which we illustrate through simulation. As an illustration, we use hedge fund index data, and find that our estimated alphas are, on average, 0.13% per month higher (1.6% per year than alphas estimated through Ordinary Least Squares. The approach exhibits fast adaptation to abrupt changes in the parameters, as seen in our estimated alphas and betas, which exhibit high volatility, especially in periods which can be identified as times of stressful market events, a reflection of the dynamic positioning of hedge fund portfolio managers.
Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks
Directory of Open Access Journals (Sweden)
Domenico Maisto
2016-02-01
Full Text Available We present an information-theoretic method permitting one to find structure in a problem space (here, in a spatial navigation domain and cluster it in ways that are convenient to solve different classes of control problems, which include planning a path to a goal from a known or an unknown location, achieving multiple goals and exploring a novel environment. Our generative nonparametric approach, called the generative embedded Chinese restaurant process (geCRP, extends the family of Chinese restaurant process (CRP models by introducing a parameterizable notion of distance (or kernel between the states to be clustered together. By using different kernels, such as the the conditional probability or joint probability of two states, the same geCRP method clusters the environment in ways that are more sensitive to different control-related information, such as goal, sub-goal and path information. We perform a series of simulations in three scenarios—an open space, a grid world with four rooms and a maze having the same structure as the Hanoi Tower—in order to illustrate the characteristics of the different clusters (obtained using different kernels and their relative benefits for solving planning and control problems.
Nonparametric Bayesian Clustering of Structural Whole Brain Connectivity in Full Image Resolution
DEFF Research Database (Denmark)
Ambrosen, Karen Marie Sandø; Albers, Kristoffer Jon; Dyrby, Tim B.
2014-01-01
Diffusion magnetic resonance imaging enables measuring the structural connectivity of the human brain at a high spatial resolution. Local noisy connectivity estimates can be derived using tractography approaches and statistical models are necessary to quantify the brain’s salient structural...... organization. However, statistically modeling these massive structural connectivity datasets is a computational challenging task. We develop a high-performance inference procedure for the infinite relational model (a prominent non-parametric Bayesian model for clustering networks into structurally similar...... groups) that defines structural units at the resolution of statistical support. We apply the model to a network of structural brain connectivity in full image resolution with more than one hundred thousand regions (voxels in the gray-white matter boundary) and around one hundred million connections...
Institute of Scientific and Technical Information of China (English)
Zara Ghodsi; Emmanuel Sirimal Silva; Hossein Hassani
2015-01-01
The maternal segmentation coordinate gene bicoid plays a significant role during Drosophila embryogenesis. The gradient of Bicoid, the protein encoded by this gene, determines most aspects of head and thorax development. This paper seeks to explore the applicability of a variety of signal processing techniques at extracting bicoid expression signal, and whether these methods can outperform the current model. We evaluate the use of six different powerful and widely-used models representing both parametric and nonparametric signal processing techniques to determine the most efficient method for signal extraction in bicoid. The results are evaluated using both real and simulated data. Our findings show that the Singular Spectrum Analysis technique proposed in this paper outperforms the synthesis diffusion degradation model for filtering the noisy protein profile of bicoid whilst the exponential smoothing technique was found to be the next best alternative followed by the autoregressive integrated moving average.
Ghodsi, Zara; Silva, Emmanuel Sirimal; Hassani, Hossein
2015-06-01
The maternal segmentation coordinate gene bicoid plays a significant role during Drosophila embryogenesis. The gradient of Bicoid, the protein encoded by this gene, determines most aspects of head and thorax development. This paper seeks to explore the applicability of a variety of signal processing techniques at extracting bicoid expression signal, and whether these methods can outperform the current model. We evaluate the use of six different powerful and widely-used models representing both parametric and nonparametric signal processing techniques to determine the most efficient method for signal extraction in bicoid. The results are evaluated using both real and simulated data. Our findings show that the Singular Spectrum Analysis technique proposed in this paper outperforms the synthesis diffusion degradation model for filtering the noisy protein profile of bicoid whilst the exponential smoothing technique was found to be the next best alternative followed by the autoregressive integrated moving average.
Morton, Kenneth D., Jr.; Torrione, Peter A.; Collins, Leslie
2010-04-01
Time domain ground penetrating radar (GPR) has been shown to be a powerful sensing phenomenology for detecting buried objects such as landmines. Landmine detection with GPR data typically utilizes a feature-based pattern classification algorithm to discriminate buried landmines from other sub-surface objects. In high-fidelity GPR, the time-frequency characteristics of a landmine response should be indicative of the physical construction and material composition of the landmine and could therefore be useful for discrimination from other non-threatening sub-surface objects. In this research we propose modeling landmine time-domain responses with a nonparametric Bayesian time-series model and we perform clustering of these time-series models with a hierarchical nonparametric Bayesian model. Each time-series is modeled as a hidden Markov model (HMM) with autoregressive (AR) state densities. The proposed nonparametric Bayesian prior allows for automated learning of the number of states in the HMM as well as the AR order within each state density. This creates a flexible time-series model with complexity determined by the data. Furthermore, a hierarchical non-parametric Bayesian prior is used to group landmine responses with similar HMM model parameters, thus learning the number of distinct landmine response models within a data set. Model inference is accomplished using a fast variational mean field approximation that can be implemented for on-line learning.
Fujita, André; Takahashi, Daniel Y; Patriota, Alexandre G; Sato, João R
2014-12-10
Statistical inference of functional magnetic resonance imaging (fMRI) data is an important tool in neuroscience investigation. One major hypothesis in neuroscience is that the presence or not of a psychiatric disorder can be explained by the differences in how neurons cluster in the brain. Therefore, it is of interest to verify whether the properties of the clusters change between groups of patients and controls. The usual method to show group differences in brain imaging is to carry out a voxel-wise univariate analysis for a difference between the mean group responses using an appropriate test and to assemble the resulting 'significantly different voxels' into clusters, testing again at cluster level. In this approach, of course, the primary voxel-level test is blind to any cluster structure. Direct assessments of differences between groups at the cluster level seem to be missing in brain imaging. For this reason, we introduce a novel non-parametric statistical test called analysis of cluster structure variability (ANOCVA), which statistically tests whether two or more populations are equally clustered. The proposed method allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering. We illustrate the performance of ANOCVA through simulations and an application to an fMRI dataset composed of children with attention deficit hyperactivity disorder (ADHD) and controls. Results show that there are several differences in the clustering structure of the brain between them. Furthermore, we identify some brain regions previously not described to be involved in the ADHD pathophysiology, generating new hypotheses to be tested. The proposed method is general enough to be applied to other types of datasets, not limited to fMRI, where comparison of clustering structures is of interest. Copyright © 2014 John Wiley & Sons, Ltd.
Erkal, Denis; Belokurov, Vasily
2016-01-01
Only in the Milky Way is it possible to conduct an experiment which uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, dark perturbers induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique which allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, $\\sim2^{\\circ}$ and $\\sim9^{\\circ}$ wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced inter-tail asymmetry which cannot be made consist...
Decision making in coal mine planning using a non-parametric technique of indicator kriging
Energy Technology Data Exchange (ETDEWEB)
Mamurekli, D. [Hacettepe University, Ankara (Turkey). Mining Engineering Dept.
1997-03-01
In countries where low calorific value coal reserves are abundant and oil reserves are short or none, the requirement of energy production is mainly supported by coal-fired power stations. Consequently, planning to mine the low calorific value coal deposits gains much importance considering the technical and environmental restrictions. Such a mine in Kangal Town of Sivas City is the one that delivers run of mine coal directly to the power station built in the region. In case the calorific value and the ash content of the extracted coal are lower and higher than the required limits, 1300 kcal/kg and 21%, respectively, the power station may apply penalties to the coal producing company. Since the delivery is continuous and made by relying on in situ determination of pre-estimated values these assessments without defining any confidence levels are inevitably subject to inaccuracy. Thus, the company should be aware of uncertainties in making decisions and avoid conceivable risks. In this study, valuable information is provided in the form of conditional distribution to be used during planning process. It maps the indicator variogram corresponding to calorific value of 1300 kcal/kg and the ash content of 21% estimating the conditional probabilities that the true ash contents are less and calorific values are higher than the critical limits by the application of non-parametric technique, indicator kriging. In addition, it outlines the areas that are most uncertain for decision making. 4 refs., 8 figs., 3 tabs.
Erkal, Denis; Koposov, Sergey E.; Belokurov, Vasily
2017-09-01
Only in the Milky Way is it possible to conduct an experiment that uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, perturbations from dark subhaloes, as well as from GMCs and the Milky Way bar, can induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique that allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, ∼2° and ∼9° wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced intertail asymmetry which cannot be made consistent with an unperturbed stream according to a suite of simulations we have produced. We conjecture that the sharp peaks around Pal 5 are epicyclic overdensities, while the two dips are consistent with impacts by subhaloes. Assuming an age of 3.4 Gyr for Pal 5, these two gaps would correspond to the characteristic size of gaps created by subhaloes in the mass range of 106-107 M⊙ and 107-108 M⊙, respectively. In addition to dark substructure, we find that the bar of the Milky Way can plausibly produce the asymmetric density seen in Pal 5 and that GMCs could cause the smaller gap.
clues: An R Package for Nonparametric Clustering Based on Local Shrinking
Directory of Open Access Journals (Sweden)
Fang Chang
2010-02-01
Full Text Available Determining the optimal number of clusters appears to be a persistent and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusters, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and Arabie’s adjusted Rand index, Morey and Agresti’s adjusted Rand index, Fowlkes and Mallows index and Jaccard index, which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.
Comparison of reliability techniques of parametric and non-parametric method
Directory of Open Access Journals (Sweden)
C. Kalaiselvan
2016-06-01
Full Text Available Reliability of a product or system is the probability that the product performs adequately its intended function for the stated period of time under stated operating conditions. It is function of time. The most widely used nano ceramic capacitor C0G and X7R is used in this reliability study to generate the Time-to failure (TTF data. The time to failure data are identified by Accelerated Life Test (ALT and Highly Accelerated Life Testing (HALT. The test is conducted at high stress level to generate more failure rate within the short interval of time. The reliability method used to convert accelerated to actual condition is Parametric method and Non-Parametric method. In this paper, comparative study has been done for Parametric and Non-Parametric methods to identify the failure data. The Weibull distribution is identified for parametric method; Kaplan–Meier and Simple Actuarial Method are identified for non-parametric method. The time taken to identify the mean time to failure (MTTF in accelerating condition is the same for parametric and non-parametric method with relative deviation.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access to no...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
Goswami, Saptarsi
2012-01-01
Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clusteri...
Bag, Soumen; Sen, Prithwiraj; Sanyal, Gautam
2011-01-01
Facial expressions convey non-verbal cues, which play an important role in interpersonal relations. Automatic recognition of human face based on facial expression can be an important component of natural human-machine interface. It may also be used in behavioural science. Although human can recognize the face practically without any effort, but reliable face recognition by machine is a challenge. This paper presents a new approach for recognizing the face of a person considering the expressions of the same human face at different instances of time. This methodology is developed combining Eigenface method for feature extraction and modified k-Means clustering for identification of the human face. This method endowed the face recognition without using the conventional distance measure classifiers. Simulation results show that proposed face recognition using perception of k-Means clustering is useful for face images with different facial expressions.
Percolation technique for galaxy clustering
Klypin, Anatoly; Shandarin, Sergei F.
1993-01-01
We study percolation in mass and galaxy distributions obtained in 3D simulations of the CDM, C + HDM, and the power law (n = -1) models in the Omega = 1 universe. Percolation statistics is used here as a quantitative measure of the degree to which a mass or galaxy distribution is of a filamentary or cellular type. The very fast code used calculates the statistics of clusters along with the direct detection of percolation. We found that the two parameters mu(infinity), characterizing the size of the largest cluster, and mu-squared, characterizing the weighted mean size of all clusters excluding the largest one, are extremely useful for evaluating the percolation threshold. An advantage of using these parameters is their low sensitivity to boundary effects. We show that both the CDM and the C + HDM models are extremely filamentary both in mass and galaxy distribution. The percolation thresholds for the mass distributions are determined.
DEFF Research Database (Denmark)
Henningsen, Geraldine; Henningsen, Arne; Henning, Christian H. C. A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs (TAC). One of the major factors in TAC theory is information. Information networks can catalyse the interpersonal information exchange and hence, increase the access...... to nonpublic information. Our analysis shows that information networks have an impact on the level of TAC. Many resources that are sacrificed for TAC are inputs that also enter the technical production process. As most production data do not separate between these two usages of inputs, high transaction costs...... are unveiled by reduced productivity. A cross-validated local linear non-parametric regression shows that good information networks increase the productivity of farms. A bootstrapping procedure confirms that this result is statistically significant....
The application of non-parametric statistical techniques to an ALARA programme.
Moon, J H; Cho, Y H; Kang, C S
2001-01-01
For the cost-effective reduction of occupational radiation dose (ORD) at nuclear power plants, it is necessary to identify what are the processes of repetitive high ORD during maintenance and repair operations. To identify the processes, the point values such as mean and median are generally used, but they sometimes lead to misjudgment since they cannot show other important characteristics such as dose distributions and frequencies of radiation jobs. As an alternative, the non-parametric analysis method is proposed, which effectively identifies the processes of repetitive high ORD. As a case study, the method is applied to ORD data of maintenance and repair processes at Kori Units 3 and 4 that are pressurised water reactors with 950 MWe capacity and have been operating since 1986 and 1987 respectively, in Korea and the method is demonstrated to be an efficient way of analysing the data.
Customer Data Clustering using Data Mining Technique
Rajagopal, Dr Sankar
2011-01-01
Classification and patterns extraction from customer data is very important for business support and decision making. Timely identification of newly emerging trends is very important in business process. Large companies are having huge volume of data but starving for knowledge. To overcome the organization current issue, the new breed of technique is required that has intelligence and capability to solve the knowledge scarcity and the technique is called Data mining. The objectives of this paper are to identify the high-profit, high-value and low-risk customers by one of the data mining technique - customer clustering. In the first phase, cleansing the data and developed the patterns via demographic clustering algorithm using IBM I-Miner. In the second phase, profiling the data, develop the clusters and identify the high-value low-risk customers. This cluster typically represents the 10-20 percent of customers which yields 80% of the revenue.
Graph visualization techniques for web clustering engines.
Di Giacomo, Emilio; Didimo, Walter; Grilli, Luca; Liotta, Giuseppe
2007-01-01
One of the most challenging issues in mining information from the World Wide Web is the design of systems that present the data to the end user by clustering them into meaningful semantic categories. We show that the analysis of the results of a clustering engine can significantly take advantage of enhanced graph drawing and visualization techniques. We propose a graph-based user interface for Web clustering engines that makes it possible for the user to explore and visualize the different semantic categories and their relationships at the desired level of detail.
Coscheduling Technique for Symmetric Multiprocessor Clusters
Energy Technology Data Exchange (ETDEWEB)
Yoo, A B; Jette, M A
2000-09-18
Coscheduling is essential for obtaining good performance in a time-shared symmetric multiprocessor (SMP) cluster environment. However, the most common technique, gang scheduling, has limitations such as poor scalability and vulnerability to faults mainly due to explicit synchronization between its components. A decentralized approach called dynamic coscheduling (DCS) has been shown to be effective for network of workstations (NOW), but this technique is not suitable for the workloads on a very large SMP-cluster with thousands of processors. Furthermore, its implementation can be prohibitively expensive for such a large-scale machine. IN this paper, they propose a novel coscheduling technique based on the DCS approach which can achieve coscheduling on very large SMP-clusters in a scalable, efficient, and cost-effective way. In the proposed technique, each local scheduler achieves coscheduling based upon message traffic between the components of parallel jobs. Message trapping is carried out at the user-level, eliminating the need for unsupported hardware or device-level programming. A sending process attaches its status to outgoing messages so local schedulers on remote nodes can make more intelligent scheduling decisions. Once scheduled, processes are guaranteed some minimum period of time to execute. This provides an opportunity to synchronize the parallel job's components across all nodes and achieve good program performance. The results from a performance study reveal that the proposed technique is a promising approach that can reduce response time significantly over uncoordinated time-sharing and batch scheduling.
Directory of Open Access Journals (Sweden)
José L. Valencia
2015-11-01
Full Text Available Rainfall, one of the most important climate variables, is commonly studied due to its great heterogeneity, which occasionally causes negative economic, social, and environmental consequences. Modeling the spatial distributions of rainfall patterns over watersheds has become a major challenge for water resources management. Multifractal analysis can be used to reproduce the scale invariance and intermittency of rainfall processes. To identify which factors are the most influential on the variability of multifractal parameters and, consequently, on the spatial distribution of rainfall patterns for different time scales in this study, universal multifractal (UM analysis—C1, α, and γs UM parameters—was combined with non-parametric statistical techniques that allow spatial-temporal comparisons of distributions by gradients. The proposed combined approach was applied to a daily rainfall dataset of 132 time-series from 1931 to 2009, homogeneously spatially-distributed across a 25 km × 25 km grid covering the Ebro River Basin. A homogeneous increase in C1 over the watershed and a decrease in α mainly in the western regions, were detected, suggesting an increase in the frequency of dry periods at different scales and an increase in the occurrence of rainfall process variability over the last decades.
Energy Technology Data Exchange (ETDEWEB)
Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)
1990-11-01
A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.
Clustering microcalcifications techniques in digital mammograms
Díaz, Claudia. C.; Bosco, Paolo; Cerello, Piergiorgio
2008-11-01
Breast cancer has become a serious public health problem around the world. However, this pathology can be treated if it is detected in early stages. This task is achieved by a radiologist, who should read a large amount of mammograms per day, either for a screening or diagnostic purpose in mammography. However human factors could affect the diagnosis. Computer Aided Detection is an automatic system, which can help to specialists in the detection of possible signs of malignancy in mammograms. Microcalcifications play an important role in early detection, so we focused on their study. The two mammographic features that indicate the microcalcifications could be probably malignant are small size and clustered distribution. We worked with density techniques for automatic clustering, and we applied them on a mammography CAD prototype developed at INFN-Turin, Italy. An improvement of performance is achieved analyzing images from a Perugia-Assisi Hospital, in Italy.
Localization technique in VANets using Clustering (LVC
Directory of Open Access Journals (Sweden)
Nasreddine Lagraa
2010-07-01
Full Text Available Relative location information is an important aspect in vehicular Ad hoc networks .It helps to build vehicle topology maps, also provides location information of nearby vehicles. Due to the characteristics of VANet, the existing relative positioning techniques developed initially for Ad hoc or sensors networks are not directly applicable to vehicular networks. In this paper, we propose a protocol of localization in VANet when no GPS information is available, based on clustering and has the advantage to use a single coordinates system. We study its impact on the performances of the network, by using the network simulator NS-2.
Technique for fast and efficient hierarchical clustering
Stork, Christopher
2013-10-08
A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.
RATC: A Robust Automated Tag Clustering Technique
Boratto, Ludovico; Carta, Salvatore; Vargiu, Eloisa
Nowadays, the most dominant and noteworthy web information sources are developed according to the collaborative-web paradigm, also known as Web 2.0. In particular, it represents a novel paradigm in the way users interact with the web. Users (also called prosumers) are no longer passive consumers of published content, but become involved, implicitly and explicitly, as they cooperate by providing their own resources in an “architecture of participation”. In this scenario, collaborative tagging, i.e., the process of classifying shared resources by using keywords, becomes more and more popular. The main problem in such task is related to well-known linguistic phenomena, such as polysemy and synonymy, making effective content retrieval harder. In this paper, an approach that monitors users activity in a tagging system and dynamically quantifies associations among tags is presented. The associations are then used to create tags clusters. Experiments are performed comparing the proposed approach with a state-of-the-art tag clustering technique. Results -given in terms of classical precision and recall- show that the approach is quite effective in the presence of strongly related tags in a cluster.
König, Seth D; Buffalo, Elizabeth A
2014-04-30
Eye tracking is an important component of many human and non-human primate behavioral experiments. As behavioral paradigms have become more complex, including unconstrained viewing of natural images, eye movements measured in these paradigms have become more variable and complex as well. Accordingly, the common practice of using acceleration, dispersion, or velocity thresholds to segment viewing behavior into periods of fixations and saccades may be insufficient. Here we propose a novel algorithm, called Cluster Fix, which uses k-means cluster analysis to take advantage of the qualitative differences between fixations and saccades. The algorithm finds natural divisions in 4 state space parameters-distance, velocity, acceleration, and angular velocity-to separate scan paths into periods of fixations and saccades. The number and size of clusters adjusts to the variability of individual scan paths. Cluster Fix can detect small saccades that were often indistinguishable from noisy fixations. Local analysis of fixations helped determine the transition times between fixations and saccades. Because Cluster Fix detects natural divisions in the data, predefined thresholds are not needed. A major advantage of Cluster Fix is the ability to precisely identify the beginning and end of saccades, which is essential for studying neural activity that is modulated by or time-locked to saccades. Our data suggest that Cluster Fix is more sensitive than threshold-based algorithms but comes at the cost of an increase in computational time. Copyright © 2014 Elsevier B.V. All rights reserved.
Ruppin, F.; Adam, R.; Comis, B.; Ade, P.; André, P.; Arnaud, M.; Beelen, A.; Benoît, A.; Bideaud, A.; Billot, N.; Bourrion, O.; Calvo, M.; Catalano, A.; Coiffard, G.; D'Addabbo, A.; De Petris, M.; Désert, F.-X.; Doyle, S.; Goupy, J.; Kramer, C.; Leclercq, S.; Macías-Pérez, J. F.; Mauskopf, P.; Mayet, F.; Monfardini, A.; Pajot, F.; Pascale, E.; Perotto, L.; Pisano, G.; Pointecouteau, E.; Ponthieu, N.; Pratt, G. W.; Revéret, V.; Ritacco, A.; Rodriguez, L.; Romero, C.; Schuster, K.; Sievers, A.; Triqueneaux, S.; Tucker, C.; Zylka, R.
2017-01-01
The determination of the thermodynamic properties of clusters of galaxies at intermediate and high redshift can bring new insights into the formation of large-scale structures. It is essential for a robust calibration of the mass-observable scaling relations and their scatter, which are key ingredients for precise cosmology using cluster statistics. Here we illustrate an application of high resolution (R 0.02 R500) to its outskirts (R 3 R500) non-parametrically for the first time at intermediate redshift. The constraints on the resulting pressure profile allow us to reduce the relative uncertainty on the integrated Compton parameter by a factor of two compared to the Planck value. Combining the tSZ data and the deprojected electronic density profile from XMM-Newton allows us to undertake a hydrostatic mass analysis, for which we study the impact of a spherical model assumption on the total mass estimate. We also investigate the radial temperature and entropy distributions. These data indicate that PSZ1 G045.85+57.71 is a massive (M500 5.5 × 1014M⊙) cool-core cluster. This work is part of a pilot study aiming at optimizing the treatment of the NIKA2 tSZ large program dedicated to the follow-up of SZ-discovered clusters at intermediate and high redshifts. This study illustrates the potential of NIKA2 to put constraints on thethermodynamic properties and tSZ-scaling relations of these clusters, and demonstrates the excellent synergy between tSZ and X-ray observations of similar angular resolution.
A Novel Trajectory Clustering technique for selecting cluster heads in Wireless Sensor Networks
Munaga, Hazarath; Venkateswarlu, N B
2011-01-01
Wireless sensor networks (WSNs) suffers from the hot spot problem where the sensor nodes closest to the base station are need to relay more packet than the nodes farther away from the base station. Thus, lifetime of sensory network depends on these closest nodes. Clustering methods are used to extend the lifetime of a wireless sensor network. However, current clustering algorithms usually utilize two techniques; selecting cluster heads with more residual energy, and rotating cluster heads periodically to distribute the energy consumption among nodes in each cluster and lengthen the network lifetime. Most of the algorithms use random selection for selecting the cluster heads. Here, we propose a novel trajectory clustering technique for selecting the cluster heads in WSNs. Our algorithm selects the cluster heads based on traffic and rotates periodically. It provides the first trajectory based clustering technique for selecting the cluster heads and to extenuate the hot spot problem by prolonging the network lif...
Multilevel Techniques for the Clustering Problem
Directory of Open Access Journals (Sweden)
Noureddine Bouhmala
2014-02-01
Full Text Available Data Mining is concerned with the discovery of int eresting patterns and knowledge in data repositories. Cluster Analysis which belongs to the core methods of data mining is the process of discovering homogeneous groups called clusters. Given a data-set and some measure of similarity between data objects, the goal in most c lustering algorithms is maximizing both the homogeneity within each cluster and the heterogene ity between different clusters. In this work, two multilevel algorithms for the clustering problem are introduced. The multilevel paradigm suggests looking at the clustering proble m as a hierarchical optimization process going through different levels evolving from a coar se grain to fine grain strategy. The clustering problem is solved by first reducing the problem level by level to a coarser problem where an initial clustering is computed. The clustering of the coarser problem is mapped back level-by- level to obtain a better clustering of the original problem by refining the intermediate different clustering obtained at various levels. A benchmark using a number of data sets collected from a variety of domains is used to compare the effective ness of the hierarchical approach against its single-level counterpart.
Finding Within Cluster Dense Regions Using Distance Based Technique
Directory of Open Access Journals (Sweden)
Wesam Ashour
2012-03-01
Full Text Available One of the main categories in Data Clustering is density based clustering. Density based clustering techniques like DBSCAN are attractive because they can find arbitrary shaped clusters along with noisy outlier. The main weakness of the traditional density based algorithms like DBSCAN is clustering the different density level data sets. DBSCAN calculations done according to given parameters applied to all points in a data set, while densities of the data set clusters may be totally different. The proposed algorithm overcomes this weakness of the traditional density based algorithms. The algorithm starts with partitioning the data within a cluster to units based on a user parameter and compute the density for each unit separately. Consequently, the algorithm compares the results and merges neighboring units with closer approximate density values to become a new cluster. The experimental results of the simulation show that the proposed algorithm gives good results in finding clusters for different density cluster data set.
Knowledge Engineering Technique for Cluster Development
Sureephong, Pradorn; Ouzrout, Yacine; Neubert, Gilles; Bouras, Abdelaziz
2007-01-01
After the concept of industry cluster was tangibly applied in many countries, SMEs trended to link to each other to maintain their competitiveness in the market. The major key success factors of the cluster are knowledge sharing and collaboration between partners. This knowledge is collected in form of tacit and explicit knowledge from experts and institutions within the cluster. The objective of this study is about enhancing the industry cluster with knowledge management by using knowledge engineering which is one of the most important method for managing knowledge. This work analyzed three well known knowledge engineering methods, i.e. MOKA, SPEDE and CommonKADS, and compares the capability to be implemented in the cluster context. Then, we selected one method and proposed the adapted methodology. At the end of this paper, we validated and demonstrated the proposed methodology with some primary result by using case study of handicraft cluster in Thailand.
Nonparametric estimation of ultrasound pulses
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Leeman, Sidney
1994-01-01
An algorithm for nonparametric estimation of 1D ultrasound pulses in echo sequences from human tissues is derived. The technique is a variation of the homomorphic filtering technique using the real cepstrum, and the underlying basis of the method is explained. The algorithm exploits a priori...
A contingency table approach to nonparametric testing
Rayner, JCW
2000-01-01
Most texts on nonparametric techniques concentrate on location and linear-linear (correlation) tests, with less emphasis on dispersion effects and linear-quadratic tests. Tests for higher moment effects are virtually ignored. Using a fresh approach, A Contingency Table Approach to Nonparametric Testing unifies and extends the popular, standard tests by linking them to tests based on models for data that can be presented in contingency tables.This approach unifies popular nonparametric statistical inference and makes the traditional, most commonly performed nonparametric analyses much more comp
Semi- and Nonparametric ARCH Processes
Directory of Open Access Journals (Sweden)
Oliver B. Linton
2011-01-01
Full Text Available ARCH/GARCH modelling has been successfully applied in empirical finance for many years. This paper surveys the semiparametric and nonparametric methods in univariate and multivariate ARCH/GARCH models. First, we introduce some specific semiparametric models and investigate the semiparametric and nonparametrics estimation techniques applied to: the error density, the functional form of the volatility function, the relationship between mean and variance, long memory processes, locally stationary processes, continuous time processes and multivariate models. The second part of the paper is about the general properties of such processes, including stationary conditions, ergodic conditions and mixing conditions. The last part is on the estimation methods in ARCH/GARCH processes.
Hierarchical clustering techniques for image database organization and summarization
Vellaikal, Asha; Kuo, C.-C. Jay
1998-10-01
This paper investigates clustering techniques as a method of organizing image databases to support popular visual management functions such as searching, browsing and navigation. Different types of hierarchical agglomerative clustering techniques are studied as a method of organizing features space as well as summarizing image groups by the selection of a few appropriate representatives. Retrieval performance using both single and multiple level hierarchies are experimented with and the algorithms show an interesting relationship between the top k correct retrievals and the number of comparisons required. Some arguments are given to support the use of such cluster-based techniques for managing distributed image databases.
DETERMINE OPTIMUM NUMBER OF COMPACT OVERLAPPED CLUSTERS USING FRLVQ TECHNIQUE
Institute of Scientific and Technical Information of China (English)
Xu Wenhuan; Huang Qiang; Ji Zhen; Zhang Jihong
2005-01-01
A method, named XHJ-method, is proposed in this letter to determine the number of clusters of a data set, which incorporates with the Fuzzy Reinforced Learning Vector Quantization (FRLVQ) technique. The simulation results show that this new method works well for the traditional iris data and an artificial data set, which contains un-equally sized and spaced clusters.
An Empirical Analysis of Rough Set Categorical Clustering Techniques
2017-01-01
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344
An Empirical Analysis of Rough Set Categorical Clustering Techniques.
Uddin, Jamal; Ghazali, Rozaida; Deris, Mustafa Mat
2017-01-01
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2010-01-01
Overall, this remains a very fine book suitable for a graduate-level course in nonparametric statistics. I recommend it for all people interested in learning the basic ideas of nonparametric statistical inference.-Eugenia Stoimenova, Journal of Applied Statistics, June 2012… one of the best books available for a graduate (or advanced undergraduate) text for a theory course on nonparametric statistics. … a very well-written and organized book on nonparametric statistics, especially useful and recommended for teachers and graduate students.-Biometrics, 67, September 2011This excellently presente
Astronomical Methods for Nonparametric Regression
Steinhardt, Charles L.; Jermyn, Adam
2017-01-01
I will discuss commonly used techniques for nonparametric regression in astronomy. We find that several of them, particularly running averages and running medians, are generically biased, asymmetric between dependent and independent variables, and perform poorly in recovering the underlying function, even when errors are present only in one variable. We then examine less-commonly used techniques such as Multivariate Adaptive Regressive Splines and Boosted Trees and find them superior in bias, asymmetry, and variance both theoretically and in practice under a wide range of numerical benchmarks. In this context the chief advantage of the common techniques is runtime, which even for large datasets is now measured in microseconds compared with milliseconds for the more statistically robust techniques. This points to a tradeoff between bias, variance, and computational resources which in recent years has shifted heavily in favor of the more advanced methods, primarily driven by Moore's Law. Along these lines, we also propose a new algorithm which has better overall statistical properties than all techniques examined thus far, at the cost of significantly worse runtime, in addition to providing guidance on choosing the nonparametric regression technique most suitable to any specific problem. We then examine the more general problem of errors in both variables and provide a new algorithm which performs well in most cases and lacks the clear asymmetry of existing non-parametric methods, which fail to account for errors in both variables.
Clustering economies based on multiple criteria decision making techniques
Directory of Open Access Journals (Sweden)
Mansour Momeni
2011-10-01
Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.
Clustering economies based on multiple criteria decision making techniques
2011-01-01
One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP) to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group i...
Exploitation of Clustering Techniques in Transactional Healthcare Data
Directory of Open Access Journals (Sweden)
Naeem Ahmad Mahoto
2014-03-01
Full Text Available Healthcare service centres equipped with electronic health systems have improved their resources as well as treatment processes. The dynamic nature of healthcare data of each individual makes it complex and difficult for physicians to manually mediate them; therefore, automatic techniques are essential to manage the quality and standardization of treatment procedures. Exploratory data analysis, patternanalysis and grouping of data is managed using clustering techniques, which work as an unsupervised classification. A number of healthcare applications are developed that use several data mining techniques for classification, clustering and extracting useful information from healthcare data. The challenging issue in this domain is to select adequate data mining algorithm for optimal results. This paper exploits three different clustering algorithms: DBSCAN (Density-Based Clustering, agglomerative hierarchical and k-means in real transactional healthcare data of diabetic patients (taken as case study to analyse their performance in large and dispersed healthcare data. The best solution of cluster sets among the exploited algorithms is evaluated using clustering quality indexes and is selected to identify the possible subgroups of patients having similar treatment patterns
Bootstrap Estimation for Nonparametric Efficiency Estimates
1995-01-01
This paper develops a consistent bootstrap estimation procedure to obtain confidence intervals for nonparametric measures of productive efficiency. Although the methodology is illustrated in terms of technical efficiency measured by output distance functions, the technique can be easily extended to other consistent nonparametric frontier models. Variation in estimated efficiency scores is assumed to result from variation in empirical approximations to the true boundary of the production set. ...
Quantal Response: Nonparametric Modeling
2017-01-01
spline N−spline Fig. 3 Logistic regression 7 Approved for public release; distribution is unlimited. 5. Nonparametric QR Models Nonparametric linear ...stimulus and probability of response. The Generalized Linear Model approach does not make use of the limit distribution but allows arbitrary functional...7. Conclusions and Recommendations 18 8. References 19 Appendix A. The Linear Model 21 Appendix B. The Generalized Linear Model 33 Appendix C. B
Software refactoring at the package level using clustering techniques
Alkhalid, A.
2011-01-01
Enhancing, modifying or adapting the software to new requirements increases the internal software complexity. Software with high level of internal complexity is difficult to maintain. Software refactoring reduces software complexity and hence decreases the maintenance effort. However, software refactoring becomes quite challenging task as the software evolves. The authors use clustering as a pattern recognition technique to assist in software refactoring activities at the package level. The approach presents a computer aided support for identifying ill-structured packages and provides suggestions for software designer to balance between intra-package cohesion and inter-package coupling. A comparative study is conducted applying three different clustering techniques on different software systems. In addition, the application of refactoring at the package level using an adaptive k-nearest neighbour (A-KNN) algorithm is introduced. The authors compared A-KNN technique with the other clustering techniques (viz. single linkage algorithm, complete linkage algorithm and weighted pair-group method using arithmetic averages). The new technique shows competitive performance with lower computational complexity. © 2011 The Institution of Engineering and Technology.
Adaptive Techniques for Clustered N-Body Cosmological Simulations
Menon, Harshitha; Zheng, Gengbin; Jetley, Pritish; Kale, Laxmikant; Quinn, Thomas; Governato, Fabio
2014-01-01
ChaNGa is an N-body cosmology simulation application implemented using Charm++. In this paper, we present the parallel design of ChaNGa and address many challenges arising due to the high dynamic ranges of clustered datasets. We focus on optimizations based on adaptive techniques for scaling to more than 128K cores. We demonstrate strong scaling on up to 512K cores of Blue Waters evolving 12 and 24 billion particles. We also show strong scaling of highly clustered datasets on up to 128K cores.
Non-parametric Bayesian human motion recognition using a single MEMS tri-axial accelerometer.
Ahmed, M Ejaz; Song, Ju Bin
2012-09-27
In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM) and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM) technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Non-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer
Directory of Open Access Journals (Sweden)
M. Ejaz Ahmed
2012-09-01
Full Text Available In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the human motions. The infinite Gaussian mixture model (IGMM and collapsed Gibbs sampler are adopted to cluster the human motions using extracted features. From the experimental results, we show that the unanticipated human motions are detected and recognized with significant accuracy, as compared with the parametric Fuzzy C-Mean (FCM technique, the unsupervised K-means algorithm, and the non-parametric mean-shift method.
Marine data users clustering using data mining technique
Directory of Open Access Journals (Sweden)
Farnaz Ghiasi
2015-09-01
Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.
Brain tumor segmentation based on a hybrid clustering technique
Directory of Open Access Journals (Sweden)
Eman Abdel-Maksoud
2015-03-01
This paper presents an efficient image segmentation approach using K-means clustering technique integrated with Fuzzy C-means algorithm. It is followed by thresholding and level set segmentation stages to provide an accurate brain tumor detection. The proposed technique can get benefits of the K-means clustering for image segmentation in the aspects of minimal computation time. In addition, it can get advantages of the Fuzzy C-means in the aspects of accuracy. The performance of the proposed image segmentation approach was evaluated by comparing it with some state of the art segmentation algorithms in case of accuracy, processing time, and performance. The accuracy was evaluated by comparing the results with the ground truth of each processed image. The experimental results clarify the effectiveness of our proposed approach to deal with a higher number of segmentation problems via improving the segmentation quality and accuracy in minimal execution time.
Comparison of Rank Analysis of Covariance and Nonparametric Randomized Blocks Analysis.
Porter, Andrew C.; McSweeney, Maryellen
The relative power of three possible experimental designs under the condition that data is to be analyzed by nonparametric techniques; the comparison of the power of each nonparametric technique to its parametric analogue; and the comparison of relative powers using nonparametric and parametric techniques are discussed. The three nonparametric…
Nonparametric statistical methods
Hollander, Myles; Chicken, Eric
2013-01-01
Praise for the Second Edition"This book should be an essential part of the personal library of every practicing statistician."-Technometrics Thoroughly revised and updated, the new edition of Nonparametric Statistical Methods includes additional modern topics and procedures, more practical data sets, and new problems from real-life situations. The book continues to emphasize the importance of nonparametric methods as a significant branch of modern statistics and equips readers with the conceptual and technical skills necessary to select and apply the appropriate procedures for any given sit
Bayesian nonparametric data analysis
Müller, Peter; Jara, Alejandro; Hanson, Tim
2015-01-01
This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.
Mineral Detection using K-Means Clustering Technique
Directory of Open Access Journals (Sweden)
P. Bangarraju
2014-04-01
Full Text Available This paper is all about a novel algorithm formulated with k-means clustering performed on remote sensing images. The fields of Remote Sensing are very wide and its techniques and applications are used both in the data acquisition method and data processing procedures. It is also a fast developing field with respect to all the above terms. Remote Sensing plays a very important role in understanding the natural and human processes affecting the earth’s minerals. The k-means clustering technique is used for segmentation or feature selection of passive and active imaging and non-imaging Remote Sensing, on airborne or on satellite platforms, from monochromatic to hyperspectral. So here we concentrate on the images taken on or above the surface of the earth which are applied based on the proposed algorithm to detect the minerals like Giacomo that exist on the surface of the earth. Our experimental results demonstrate that our technique can improve the computational speed of the direct k-means algorithm by an order to two orders of magnitude in the total number of distance calculations and the overall time.
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Directory of Open Access Journals (Sweden)
Kais Dai
2015-01-01
Full Text Available The socialization of the web has undertaken a new d imension after the emergence of the Online Social Networks (OSN concept. The fact that each I nternet user becomes a potential content creator entails managing a big amount of data. This paper explores the most popular professional OSN: LinkedIn. A scraping technique wa s implemented to get around 5 Million public profiles. The application of natural languag e processing techniques (NLP to classify the educational background and to cluster the professio nal background of the collected profiles led us to provide some insights about this OSN’s users and to evaluate the relationships between educational degrees and professional careers.
Nonparametric Predictive Regression
Ioannis Kasparis; Elena Andreou; Phillips, Peter C.B.
2012-01-01
A unifying framework for inference is developed in predictive regressions where the predictor has unknown integration properties and may be stationary or nonstationary. Two easily implemented nonparametric F-tests are proposed. The test statistics are related to those of Kasparis and Phillips (2012) and are obtained by kernel regression. The limit distribution of these predictive tests holds for a wide range of predictors including stationary as well as non-stationary fractional and near unit...
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
WORMHOLE ATTACK MITIGATION IN MANET: A CLUSTER BASED AVOIDANCE TECHNIQUE
Directory of Open Access Journals (Sweden)
Subhashis Banerjee
2014-01-01
Full Text Available A Mobile Ad-Hoc Network (MANET is a self configuring, infrastructure less network of mobile devices connected by wireless links. Loopholes like wireless medium, lack of a fixed infrastructure, dynamic topology, rapid deployment practices, and the hostile environments in which they may be deployed, make MANET vulnerable to a wide range of security attacks and Wormhole attack is one of them. During this attack a malicious node captures packets from one location in the network, and tunnels them to another colluding malicious node at a distant point, which replays them locally. This paper presents a cluster based Wormhole attack avoidance technique. The concept of hierarchical clustering with a novel hierarchical 32- bit node addressing scheme is used for avoiding the attacking path during the route discovery phase of the DSR protocol, which is considered as the under lying routing protocol. Pinpointing the location of the wormhole nodes in the case of exposed attack is also given by using this method.
Nonparametric statistical inference
Gibbons, Jean Dickinson
2014-01-01
Thoroughly revised and reorganized, the fourth edition presents in-depth coverage of the theory and methods of the most widely used nonparametric procedures in statistical analysis and offers example applications appropriate for all areas of the social, behavioral, and life sciences. The book presents new material on the quantiles, the calculation of exact and simulated power, multiple comparisons, additional goodness-of-fit tests, methods of analysis of count data, and modern computer applications using MINITAB, SAS, and STATXACT. It includes tabular guides for simplified applications of tests and finding P values and confidence interval estimates.
Directory of Open Access Journals (Sweden)
Thenmozhi Srinivasan
2015-01-01
Full Text Available Clusters of high-dimensional data techniques are emerging, according to data noisy and poor quality challenges. This paper has been developed to cluster data using high-dimensional similarity based PCM (SPCM, with ant colony optimization intelligence which is effective in clustering nonspatial data without getting knowledge about cluster number from the user. The PCM becomes similarity based by using mountain method with it. Though this is efficient clustering, it is checked for optimization using ant colony algorithm with swarm intelligence. Thus the scalable clustering technique is obtained and the evaluation results are checked with synthetic datasets.
Directory of Open Access Journals (Sweden)
Wheeler David C
2007-03-01
Full Text Available Abstract Background Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996–2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS. Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters. Results We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern
Nonparametric tests for censored data
Bagdonavicus, Vilijandas; Nikulin, Mikhail
2013-01-01
This book concerns testing hypotheses in non-parametric models. Generalizations of many non-parametric tests to the case of censored and truncated data are considered. Most of the test results are proved and real applications are illustrated using examples. Theories and exercises are provided. The incorrect use of many tests applying most statistical software is highlighted and discussed.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
Energy Technology Data Exchange (ETDEWEB)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh; Wang, Shaobu; Mackey, Patrick S.; Hines, Paul; Huang, Zhenyu
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.
EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES
Directory of Open Access Journals (Sweden)
D.Kerana Hanirex
2011-03-01
Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.
Differences in Pedaling Technique in Cycling: A Cluster Analysis.
Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A
2016-10-01
To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (POVT2) compared with cycling at their maximal power output (POMAX). Twenty athletes performed an incremental cycling test to determine their power output (POMAX and POVT2; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in POMAX and POVT2. Athletes were assigned to 2 clusters based on the behavior of outcome variables at POVT2 and POMAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at POVT2 vs POMAX, cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
Amrit, Chintan Amrit; Hek, Jeroen
2016-01-01
This research addresses the problem of clustering the results of brainstorm sessions. Going through all ideas and clustering them can be a time consuming task. In this research we design a computer-aided approach that can help with clustering of these results. We have limited ourselves to looking at
Amrit, Chintan; Hek, Jeroen
2016-01-01
This research addresses the problem of clustering the results of brainstorm sessions. Going through all ideas and clustering them can be a time consuming task. In this research we design a computer-aided approach that can help with clustering of these results. We have limited ourselves to looking at
CURRENT STATUS OF NONPARAMETRIC STATISTICS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-02-01
Full Text Available Nonparametric statistics is one of the five points of growth of applied mathematical statistics. Despite the large number of publications on specific issues of nonparametric statistics, the internal structure of this research direction has remained undeveloped. The purpose of this article is to consider its division into regions based on the existing practice of scientific activity determination of nonparametric statistics and classify investigations on nonparametric statistical methods. Nonparametric statistics allows to make statistical inference, in particular, to estimate the characteristics of the distribution and testing statistical hypotheses without, as a rule, weakly proven assumptions about the distribution function of samples included in a particular parametric family. For example, the widespread belief that the statistical data are often have the normal distribution. Meanwhile, analysis of results of observations, in particular, measurement errors, always leads to the same conclusion - in most cases the actual distribution significantly different from normal. Uncritical use of the hypothesis of normality often leads to significant errors, in areas such as rejection of outlying observation results (emissions, the statistical quality control, and in other cases. Therefore, it is advisable to use nonparametric methods, in which the distribution functions of the results of observations are imposed only weak requirements. It is usually assumed only their continuity. On the basis of generalization of numerous studies it can be stated that to date, using nonparametric methods can solve almost the same number of tasks that previously used parametric methods. Certain statements in the literature are incorrect that nonparametric methods have less power, or require larger sample sizes than parametric methods. Note that in the nonparametric statistics, as in mathematical statistics in general, there remain a number of unresolved problems
Non-Parametric Estimation of Correlation Functions
DEFF Research Database (Denmark)
Brincker, Rune; Rytter, Anders; Krenk, Steen
In this paper three methods of non-parametric correlation function estimation are reviewed and evaluated: the direct method, estimation by the Fast Fourier Transform and finally estimation by the Random Decrement technique. The basic ideas of the techniques are reviewed, sources of bias are pointed...... out, and methods to prevent bias are presented. The techniques are evaluated by comparing their speed and accuracy on the simple case of estimating auto-correlation functions for the response of a single degree-of-freedom system loaded with white noise....
Nonparametric statistical methods using R
Kloke, John
2014-01-01
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
The k-means clustering technique: General considerations and implementation in Mathematica
Directory of Open Access Journals (Sweden)
Laurence Morissette
2013-02-01
Full Text Available Data clustering techniques are valuable tools for researchers working with large databases of multivariate data. In this tutorial, we present a simple yet powerful one: the k-means clustering technique, through three different algorithms: the Forgy/Lloyd, algorithm, the MacQueen algorithm and the Hartigan and Wong algorithm. We then present an implementation in Mathematica and various examples of the different options available to illustrate the application of the technique.
Preliminary results on nonparametric facial occlusion detection
Directory of Open Access Journals (Sweden)
Daniel LÓPEZ SÁNCHEZ
2016-10-01
Full Text Available The problem of face recognition has been extensively studied in the available literature, however, some aspects of this field require further research. The design and implementation of face recognition systems that can efficiently handle unconstrained conditions (e.g. pose variations, illumination, partial occlusion... is still an area under active research. This work focuses on the design of a new nonparametric occlusion detection technique. In addition, we present some preliminary results that indicate that the proposed technique might be useful to face recognition systems, allowing them to dynamically discard occluded face parts.
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Clustering, in data mining, is a useful technique for discoveringinte resting data distributions and patterns in the underlying data, and has many app lication fields, such as statistical data analysis, pattern recognition, image p rocessing, and etc. We combine sampling technique with DBSCAN alg orithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSC A N) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental resul ts demonstrate that our algorithms are effective and efficient in clustering lar ge-scale spatial databases.
Techniques for Representation of Regional Clusters in Geographical In-formation Systems
Directory of Open Access Journals (Sweden)
Adriana REVEIU
2011-01-01
Full Text Available This paper provides an overview of visualization techniques adapted for regional clusters presentation in Geographic Information Systems. Clusters are groups of companies and insti-tutions co-located in a specific geographic region and linked by interdependencies in providing a related group of products and services. The regional clusters can be visualized by projecting the data into two-dimensional space or using parallel coordinates. Cluster membership is usually represented by different colours or by dividing clusters into several panels of a grille display. Taking into consideration regional clusters requirements and the multilevel administrative division of the Romania’s territory, I used two cartograms: NUTS2- regions and NUTS3- counties, to illustrate the tools for regional clusters representation.
Directory of Open Access Journals (Sweden)
Amreen Khan,
2010-07-01
Full Text Available Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI has recently emerged that meets these requirements and has successfully been applied to a number ofreal world clustering problems. This paper looks into the use ofParticle Swarm Optimization for cluster analysis. The effectiveness of Fuzzy C-means clustering provides enhanced performance and maintains more diversity in the swarm and also allows the particles to be robust to trace the changing environment.
Measuring customer loyalty using an extended RFM and clustering technique
Directory of Open Access Journals (Sweden)
Zohre Zalaghi
2014-05-01
Full Text Available Today, the ability to identify the profitable customers, creating a long-term loyalty in them and expanding the existing relationships are considered as the key and competitive factors for a customer-oriented organization. The prerequisite for having such competitive factors is the presence of a very powerful customer relationship management (CRM. The accurate evaluation of customers’ profitability is considered as one of the fundamental reasons that lead to a successful customer relationship management. RFM is a method that scrutinizes three properties, namely recency, frequency and monetary for each customer and scores customers based on these properties. In this paper, a method is introduced that obtains the behavioral traits of customers using the extended RFM approach and having the information related to the customers of an organization; it then classifies the customers using the K-means algorithm and finally scores the customers in terms of their loyalty in each cluster. In the suggested approach, first the customers’ records will be clustered and then the RFM model items will be specified through selecting the effective properties on the customers’ loyalty rate using the multipurpose genetic algorithm. Next, they will be scored in each cluster based on the effect that they have on the loyalty rate. The influence rate each property has on loyalty is calculated using the Spearman’s correlation coefficient.
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
Summarizing Relational Data Using Semi-Supervised Genetic Algorithm-Based Clustering Techniques
Directory of Open Access Journals (Sweden)
Rayner Alfred
2010-01-01
Full Text Available Problem statement: In solving a classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data transformation becomes a tedious trial-and-error work and the classification result is often not very promising especially when the number of tables and the degree of one-to-many association are large. Approach: We proposed a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables to facilitate the task of solving a classification problem in relational database. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations. It can be used in two ways. One is user-controlled clustering, where the user may control the result of clustering by varying the compactness of the spherical cluster. The other is automatic clustering, where a non-overlap clustering strategy is applied. In this study, we use the latter method to dynamically cluster multiple instances, as a means of aggregating them and illustrate the effectiveness of this method using the semi-supervised genetic algorithm-based clustering technique. Results: It was shown in the experimental results that using the reciprocal of Davies-Bouldin Index for cluster dispersion and the reciprocal of Gini Index for cluster purity, as the fitness function in the Genetic Algorithm (GA, finds solutions with much greater accuracy. The results obtained in this study showed that automatic clustering (seeding, by optimizing the cluster dispersion or cluster purity alone using GA, provides one with good results compared to the traditional k-means clustering. However, the best result can be achieved by optimizing the combination values of both the cluster
THE EFFECT OF CLUSTERING TECHNIQUE ON WRITING EXPOSITORY ESSAYS OF EFL STUDENTS
Sabarun Sabarun
2013-01-01
The study is aimed at investigating the effectiveness of using clustering technique in writing expository essays. The aim of the study is to prove whether there is a significant difference between writing using clustering technique and writing without using it on the students’ writing achievement or not. The study belonged to experimental study by applying counterbalance procedure to collect the data. The study was conducted at the fourth semester English department students of Palangka Raya ...
Deprojection technique for galaxy cluster considering the point spread function
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
We present a new method for the analysis of Abell 1835 observed by XMM-Newton.The method is a combination of the Direct Demodulation technique and deprojection.We eliminate the effects of the point spread function(PSF) with the Direct Demodulation technique.We then use a traditional deprojection technique to study the properties of Abell 1835.Compared to only using a deprojection method,the central electron density derived from this method increases by 30%,while the temperature profile is similar.
Classification of protein profiles using fuzzy clustering techniques
DEFF Research Database (Denmark)
Karemore, Gopal; Mullick, Jhinuk B.; Sujatha, R.
2010-01-01
-to-day variation, artifacts due to experimental strategies, inherent uncertainty in pumping procedure which are very common activities during HPLC-LIF experiment. Under these circumstances we demonstrate how fuzzy clustering algorithm like Gath Geva followed by sammon mapping outperform...... PCA mapping in classifying various cancers from healthy spectra with classification rate up to 95 % from 60%. Methods are validated using various clustering indexes and shows promising improvement in developing optical pathology like HPLC-LIF for early detection of various...
An Efficient Clustering Technique for Message Passing between Data Points using Affinity Propagation
Directory of Open Access Journals (Sweden)
D. NAPOLEON,
2011-01-01
Full Text Available A wide range of clustering algorithms is available in literature and still an open area for researcher’s k-means algorithm is one of the basic and most simple partitioning clustering technique is given by Macqueen in 1967. A new clustering algorithm used in this paper is affinity propagation. The number of cluster k has been supplied by the user and the Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time between data point. In this paper we make analysis on cluster algorithm k-means, efficient k-means, and affinity propagation with colon dataset. And the result of affinity ropagation shows much lower error when compare with other algorithm and the average accuracy is good.
Institute of Scientific and Technical Information of China (English)
康春花; 任平; 曾平飞
2015-01-01
Examinations help students learn more efficiently by filling their learning gaps. To achieve this goal, we have to differentiate students who have from those who have not mastered a set of attributes as measured by the test through cognitive diagnostic assessment. K-means cluster analysis, being a nonparametric cognitive diagnosis method requires the Q-matrix only, which reflects the relationship between attributes and items. This does not require the estimation of the parameters, so is independent of sample size, simple to operate, and easy to understand. Previous research use the sum score vectors or capability scores vector as the clustering objects. These methods are only adaptive for dichotomous data. Structural response items are, however, the main type used in examinations, particularly as required in recent reforms. On the basis of previous research, this paper puts forward a method to calculate a capability matrix reflecting the mastery level on skills and is applicable to grade response items. Our study included four parts. First, we introduced the K-means cluster diagnosis method which has been adapted for dichotomous data. Second, we expanded the K-means cluster diagnosis method for grade response data (GRCDM). Third, in Part Two, we investigated the performance of the method introduced using a simulation study. Fourth, we investigated the performance of the method in an empirical study. The simulation study focused on three factors. First, the sample size was set to be 100, 500, and 1000. Second, the percentage of random errors was manipulated to be 5%, 10%, and 20%. Third, it had four hierarchies, as proposed by Leighton. All experimental conditions composed of seven attributes, different items according to hierarchies. Simulation results showed that: (1) GRCDM had a high pattern match ratio (PMR) and high marginal match ratio (MMR). This method was shown to be feasible in cognitive diagnostic assessment. (2) The classification accuracy (MMR and PMR
Comparative Studies of Clustering Techniques for Real-Time Dynamic Model Reduction
Hogan, Emilie; Halappanavar, Mahantesh; Huang, Zhenyu; Lin, Guang; Lu, Shuai; Wang, Shaobu
2015-01-01
Dynamic model reduction in power systems is necessary for improving computational efficiency. Traditional model reduction using linearized models or offline analysis would not be adequate to capture power system dynamic behaviors, especially the new mix of intermittent generation and intelligent consumption makes the power system more dynamic and non-linear. Real-time dynamic model reduction emerges as an important need. This paper explores the use of clustering techniques to analyze real-time phasor measurements to determine generator groups and representative generators for dynamic model reduction. Two clustering techniques -- graph clustering and evolutionary clustering -- are studied in this paper. Various implementations of these techniques are compared and also compared with a previously developed Singular Value Decomposition (SVD)-based dynamic model reduction approach. Various methods exhibit different levels of accuracy when comparing the reduced model simulation against the original model. But some ...
A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm
van Breukelen, Caroline
2009-01-01
We present a new cluster detection algorithm designed for finding high-redshift clusters using optical/infrared imaging data. The algorithm has two main characteristics. First, it utilises each galaxy's full redshift probability function, instead of an estimate of the photometric redshift based on the peak of the probability function and an associated Gaussian error. Second, it identifies cluster candidates through cross-checking the results of two substantially different selection techniques (the name 2TecX representing the cross-check of the two techniques). These are adaptations of the Voronoi Tesselations and Friends-Of-Friends methods. Monte-Carlo simulations of mock catalogues show that cross-checking the cluster candidates found by the two techniques significantly reduces the detection of spurious sources. Furthermore, we examine the selection effects and relative strengths and weaknesses of either method. The simulations also allow us to fine-tune the algorithm's parameters, and define completeness an...
THE EFFECT OF CLUSTERING TECHNIQUE ON WRITING EXPOSITORY ESSAYS OF EFL STUDENTS
Directory of Open Access Journals (Sweden)
Sabarun Sabarun
2013-03-01
Full Text Available The study is aimed at investigating the effectiveness of using clustering technique in writing expository essays. The aim of the study is to prove whether there is a significant difference between writing using clustering technique and writing without using it on the students’ writing achievement or not. The study belonged to experimental study by applying counterbalance procedure to collect the data. The study was conducted at the fourth semester English department students of Palangka Raya State Islamic College of 2012/ 2013 academic year. The number of the sample was 13 students. This study was restricted to two focuses: using clustering technique and without using clustering technique to write composition. Using clustering technique to write essay was one of the pre writing strategies in writing process. To answer the research problem, the t test for correlated samples was applied. The research findings showed that,it was found that the t value was 10.554.It was also found that the df (Degree of freedom of the distribution observed was 13-1= 12. Based on the Table of t value, if df was 12, the 5% of significant level of t value was at 1.782 and the 1% of significant level of t value was at 2.179. It meant that using clustering gave facilitative effect on the students’ essay writing performance. Keywords: reading comprehension, text, scaffolding
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Prediction of Adsorption of Cadmium by Hematite Using Fuzzy C-Means Clustering Technique
Directory of Open Access Journals (Sweden)
Sriparna Das
2012-11-01
Full Text Available Clustering is partitioning of data set into subsets (clusters, so that the data in each subset share some common trait. In this paper, an algorithm has been proposed based on Fuzzy C-means clustering technique for prediction of adsorption of cadmium by hematite. The original data elements have been used for clustering the random data set. The random data have been generated within the minimum and maximum value of test data. The proposed algorithm has been applied on random dataset considering the original data set as initial cluster center. A threshold value has been taken to make the boundary around the clustering center. Finally, after execution of algorithm, modified cluster centers have been computed based on each initial cluster center. The modified cluster centers have been treated as predicted data set. The algorithm has been tested in prediction of adsorption of cadmium by hematite. The error has been calculated between the original data and predicted data. It has been observed that the proposed algorithm has given better result than the previous applied methods.
Cluster Studies of Chemisorption Using Total Energy Techniques.
Wander, Adrian
Available from UMI in association with The British Library. Requires signed TDF. Chapter 1 introduces the topic. Chapter 2 contains a discussion of ab initio quantum chemistry techniques and in particular the self consistent Hartree-Fock equations. Section 2.2 discusses the Hartree -Fock equations and their matrix form the Roothaan-Hall equations. Section 2.3 deals with the important question of the choice of a basis set for molecular calculations and in section 2.4 we move on to present a brief review of the GAMESS SCF MO package. Finally, section 2.5 deals with methods of moving beyond HF theory by including electron -electron correlation effects. Chapters 3, 4 and 5 deal with applications of the ab initio method to real systems. Chapter 3 details calculations performed on the formate and methoxy radicals on the Cu(100) surface, while chapter 4 looks at the controversial topic of the low temperature structure of oxygen on Cu(110). Finally, chapter 5 considers the effects of atomic oxygen chemisorption on the Si(100)(2 times 1) reconstructed surface. While the preceding three chapters highlight the virtues of ab initio methods, chapter 6 points out some of their vices and in particular the severe demands they make on computational resources. Alternative semi-empirical techniques are then introduced in the form of the extended Huckel and ASED methods. In particular, we discuss the role of charge self consistency in semi-empirical techniques and show that in contrast to other methods, it has little effect on the quality of results produced using the ASED method. Finally, we conclude in chapter 7 by reviewing the thesis and suggesting possible future developments to this work, both in terms of interesting systems to investigate and new directions in which the theory could be developed. (Abstract shortened with permission of author.).
Non-Parametric Inference in Astrophysics
Wasserman, L H; Nichol, R C; Genovese, C; Jang, W; Connolly, A J; Moore, A W; Schneider, J; Wasserman, Larry; Miller, Christopher J.; Nichol, Robert C.; Genovese, Chris; Jang, Woncheol; Connolly, Andrew J.; Moore, Andrew W.; Schneider, Jeff; group, the PICA
2001-01-01
We discuss non-parametric density estimation and regression for astrophysics problems. In particular, we show how to compute non-parametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss non-parametric Bayesian inference.
Content Based Image Retrieval using Hierarchical and K-Means Clustering Techniques
Directory of Open Access Journals (Sweden)
V.S.V.S. Murthy
2010-03-01
Full Text Available In this paper we present an image retrieval system that takes an image as the input query and retrieves images based on image content. Content Based Image Retrieval is an approach for retrieving semantically-relevant images from an image database based on automatically-derived image features. The unique aspect of the system is the utilization of hierarchical and k-means clustering techniques. The proposed procedure consists of two stages. First, here we are going to filter most of the images in the hierarchical clustering and then apply the clustered images to KMeans, so that we can get better favored image results.
Nonparametric Inference for Periodic Sequences
Sun, Ying
2012-02-01
This article proposes a nonparametric method for estimating the period and values of a periodic sequence when the data are evenly spaced in time. The period is estimated by a "leave-out-one-cycle" version of cross-validation (CV) and complements the periodogram, a widely used tool for period estimation. The CV method is computationally simple and implicitly penalizes multiples of the smallest period, leading to a "virtually" consistent estimator of integer periods. This estimator is investigated both theoretically and by simulation.We also propose a nonparametric test of the null hypothesis that the data have constantmean against the alternative that the sequence of means is periodic. Finally, our methodology is demonstrated on three well-known time series: the sunspots and lynx trapping data, and the El Niño series of sea surface temperatures. © 2012 American Statistical Association and the American Society for Quality.
Nonparametric Econometrics: The np Package
Directory of Open Access Journals (Sweden)
Tristen Hayﬁeld
2008-07-01
Full Text Available We describe the R np package via a series of applications that may be of interest to applied econometricians. The np package implements a variety of nonparametric and semiparametric kernel-based estimators that are popular among econometricians. There are also procedures for nonparametric tests of signiﬁcance and consistent model speciﬁcation tests for parametric mean regression models and parametric quantile regression models, among others. The np package focuses on kernel methods appropriate for the mix of continuous, discrete, and categorical data often found in applied settings. Data-driven methods of bandwidth selection are emphasized throughout, though we caution the user that data-driven bandwidth selection methods can be computationally demanding.
Investigating the cultural patterns of corruption: A nonparametric analysis
Halkos, George; Tzeremes, Nickolaos
2011-01-01
By using a sample of 77 countries our analysis applies several nonparametric techniques in order to reveal the link between national culture and corruption. Based on Hofstede’s cultural dimensions and the corruption perception index, the results reveal that countries with higher levels of corruption tend to have higher power distance and collectivism values in their society.
Dalinina, R.; Petryshyn, V. A.; Lim, D. S.; Braverman, A. J.; Tripati, A. K.
2015-07-01
Microbialites are a product of trapping and binding of sediment by microbial communities, and are considered to be some of the most ancient records of life on Earth. It is a commonly held belief that microbialites are limited to extreme, hypersaline settings. However, more recent studies report their occurrence in a wider range of environments. The goal of this study is to explore whether microbialite-bearing sites share common geochemical properties. We apply statistical techniques to distinguish any common traits in these environments. These techniques ultimately could be used to address questions of microbialite distribution: are microbialites restricted to environments with specific characteristics; or are they more broadly distributed? A dataset containing hydrographic characteristics of several microbialite sites with data on pH, conductivity, alkalinity, and concentrations of several major anions and cations was constructed from previously published studies. In order to group the water samples by their natural similarities and differences, a clustering approach was chosen for analysis. k means clustering with partial distances was applied to the dataset with missing values, and separated the data into two clusters. One of the clusters is formed by samples from atoll Kiritimati (central Pacific Ocean), and the second cluster contains all other observations. Using these two clusters, the missing values were imputed by k nearest neighbor method, producing a complete dataset that can be used for further multivariate analysis. Salinity is not found to be an important variable defining clustering, and although pH defines clustering in this dataset, it is not an important variable for microbialite formation. Clustering and imputation procedures outlined here can be applied to an expanded dataset on microbialite characteristics in order to determine properties associated with microbialite-containing environments.
Directory of Open Access Journals (Sweden)
R. Dalinina
2015-07-01
Full Text Available Microbialites are a product of trapping and binding of sediment by microbial communities, and are considered to be some of the most ancient records of life on Earth. It is a commonly held belief that microbialites are limited to extreme, hypersaline settings. However, more recent studies report their occurrence in a wider range of environments. The goal of this study is to explore whether microbialite-bearing sites share common geochemical properties. We apply statistical techniques to distinguish any common traits in these environments. These techniques ultimately could be used to address questions of microbialite distribution: are microbialites restricted to environments with specific characteristics; or are they more broadly distributed? A dataset containing hydrographic characteristics of several microbialite sites with data on pH, conductivity, alkalinity, and concentrations of several major anions and cations was constructed from previously published studies. In order to group the water samples by their natural similarities and differences, a clustering approach was chosen for analysis. k means clustering with partial distances was applied to the dataset with missing values, and separated the data into two clusters. One of the clusters is formed by samples from atoll Kiritimati (central Pacific Ocean, and the second cluster contains all other observations. Using these two clusters, the missing values were imputed by k nearest neighbor method, producing a complete dataset that can be used for further multivariate analysis. Salinity is not found to be an important variable defining clustering, and although pH defines clustering in this dataset, it is not an important variable for microbialite formation. Clustering and imputation procedures outlined here can be applied to an expanded dataset on microbialite characteristics in order to determine properties associated with microbialite-containing environments.
Directory of Open Access Journals (Sweden)
Simon Benjaminsson
2010-08-01
Full Text Available Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of Independent Component Analysis (ICA have been the methods mostly used on fMRI data, e.g. in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.
Galaxy Cluster Mass Reconstruction Project: I. Methods and first results on galaxy-based techniques
Old, L; Pearce, F R; Croton, D; Muldrew, S I; Muñoz-Cuartas, J C; Gifford, D; Gray, M E; von der Linden, A; Mamon, G A; Merrifield, M R; Müller, V; Pearson, R J; Ponman, T J; Saro, A; Sepp, T; Sifón, C; Tempel, E; Tundo, E; Wang, Y O; Wojtak, R
2014-01-01
This paper is the first in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilise the positions, velocities and colours of galaxies. Our primary aim is to test the performance of these cluster mass estimation techniques on a diverse set of models that will increase in complexity. We begin by providing participating methods with data from a simple model that delivers idealised clusters, enabling us to quantify the underlying scatter intrinsic to these mass estimation techniques. The mock catalogue is based on a Halo Occupation Distribution (HOD) model that assumes spherical Navarro, Frenk and White (NFW) haloes truncated at R_200, with no substructure nor colour segregation, and with isotropic, isothermal Maxwellian velocities. We find that, above 10^14 M_solar, recovered cluster masses are correlated with the true underlying cluster mass with an intrinsic scatter of typically a factor of two. Below 10^14 M_solar, the scatter rises as the nu...
Directory of Open Access Journals (Sweden)
Mohammad Gholami
2016-01-01
Full Text Available In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1 the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2 an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
Directory of Open Access Journals (Sweden)
Arun Vasanaperumal
2015-11-01
Full Text Available There are number of potential applications of Wireless Sensor Networks (WSNs like wild habitat monitoring, forest fire detection, military surveillance etc. All these applications are constrained for power from a stand along battery power source. So it becomes of paramount importance to conserve the energy utilized from this power source. A lot of efforts have gone into this area recently and it remains as one of the hot research areas. In order to improve network lifetime and reduce average power consumption, this study proposes a novel cluster head selection algorithm. Clustering is the preferred architecture when the numbers of nodes are larger because it results in considerable power savings for large networks as compared to other ones like tree or star. Since majority of the applications generally involve more than 30 nodes, clustering has gained widespread importance and is most used network architecture. The optimum number of clusters is first selected based on the number of nodes in the network. When the network is in operation the cluster heads in a cluster are rotated periodically based on the proposed cluster head selection algorithm to increase the network lifetime. Throughout the network single-hop communication methodology is assumed. This work will serve as an encouragement for further advances in the low power techniques for implementing Wireless Sensor Networks (WSNs.
Gholami, Mohammad; Brennan, Robert W
2016-01-06
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
Two Applications of Clustering Techniques to Twitter: Community Detection and Issue Extraction
Directory of Open Access Journals (Sweden)
Yong-Hyuk Kim
2013-01-01
Full Text Available Twitter’s recent growth in the number of users has redefined its status from a simple social media service to a mass media. We deal with clustering techniques applied to Twitter network and Twitter trend analysis. When we divide and cluster Twitter network, we can find a group of users with similar inclination, called a “Community.” In this regard, we introduce the Louvain algorithm and advance a partitioned Louvain algorithm as its improved variant. In the result of the experiment based on actual Twitter data, the partitioned Louvain algorithm supplemented the performance decline and shortened the execution time. Also, we use clustering techniques for trend analysis. We use nonnegative matrix factorization (NMF, which is a convenient method to intuitively interpret and extract issues on various time scales. By cross-verifying the results using NFM, we found that it has clear correlation with the actual main issue.
Directory of Open Access Journals (Sweden)
K.Sudharson
2012-03-01
Full Text Available Analyzing and predicting behavior of node can lead to more secure and more appropriate defense mechanism for attackers in the Mobile Adhoc Network. In this work, models for dynamic recommendation based on fuzzy clustering techniques, applicable to nodes that are currently participate in the transmission of Adhoc Network. The approach focuses on both aspects of MANET mining and behavioral mining. Applying fuzzy clustering and mining techniques, the model infers the node's preferences from transmission logs. The fuzzy clustering approach, in this study, provides the possibility of capturing the uncertainty among node's behaviors. The results shown are promising and proved that integrating fuzzy approach provide us with more interesting and useful patterns which consequently making the recommender system more functional and robust.
Melodic pattern discovery by structural analysis via wavelets and clustering techniques
DEFF Research Database (Denmark)
Velarde, Gissel; Meredith, David
We present an automatic method to support melodic pattern discovery by structural analysis of symbolic representations by means of wavelet analysis and clustering techniques. In previous work, we used the method to recognize the parent works of melodic segments, or to classify tunes into tune...... to support human or computer assisted music analysis and teaching....
Hybrid Clustering-GWO-NARX neural network technique in predicting stock price
Das, Debashish; Safa Sadiq, Ali; Mirjalili, Seyedali; Noraziah, A.
2017-09-01
Prediction of stock price is one of the most challenging tasks due to nonlinear nature of the stock data. Though numerous attempts have been made to predict the stock price by applying various techniques, yet the predicted price is not always accurate and even the error rate is high to some extent. Consequently, this paper endeavours to determine an efficient stock prediction strategy by implementing a combinatorial method of Grey Wolf Optimizer (GWO), Clustering and Non Linear Autoregressive Exogenous (NARX) Technique. The study uses stock data from prominent stock market i.e. New York Stock Exchange (NYSE), NASDAQ and emerging stock market i.e. Malaysian Stock Market (Bursa Malaysia), Dhaka Stock Exchange (DSE). It applies K-means clustering algorithm to determine the most promising cluster, then MGWO is used to determine the classification rate and finally the stock price is predicted by applying NARX neural network algorithm. The prediction performance gained through experimentation is compared and assessed to guide the investors in making investment decision. The result through this technique is indeed promising as it has shown almost precise prediction and improved error rate. We have applied the hybrid Clustering-GWO-NARX neural network technique in predicting stock price. We intend to work with the effect of various factors in stock price movement and selection of parameters. We will further investigate the influence of company news either positive or negative in stock price movement. We would be also interested to predict the Stock indices.
The Application of Clustering Techniques to Citation Data. Research Reports Series B No. 6.
Arms, William Y.; Arms, Caroline
This report describes research carried out as part of the Design of Information Systems in the Social Sciences (DISISS) project. Cluster analysis techniques were applied to a machine readable file of bibliographic data in the form of cited journal titles in order to identify groupings which could be used to structure bibliographic files. Practical…
The tree clustering technique and the physical reality of galaxy groups
Directory of Open Access Journals (Sweden)
M.A. Sabry
2012-12-01
Full Text Available In this paper the tree clustering technique (the Euclidean separation distance coefficients is suggested to test how the Hickson compact groups of galaxies (HCGs are really physical groups. The method is applied on groups of 5 members only in Hickson’s catalog.
Nonparametric statistical structuring of knowledge systems using binary feature matches
DEFF Research Database (Denmark)
Mørup, Morten; Glückstad, Fumiko Kano; Herlau, Tue
2014-01-01
statistical support and how this approach generalizes to the structuring and alignment of knowledge systems. We propose a non-parametric Bayesian generative model for structuring binary feature data that does not depend on a specific choice of similarity measure. We jointly model all combinations of binary......Structuring knowledge systems with binary features is often based on imposing a similarity measure and clustering objects according to this similarity. Unfortunately, such analyses can be heavily influenced by the choice of similarity measure. Furthermore, it is unclear at which level clusters have...
Nonparametric regression with filtered data
Linton, Oliver; Nielsen, Jens Perch; Van Keilegom, Ingrid; 10.3150/10-BEJ260
2011-01-01
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Nonparametric identification of copula structures
Li, Bo
2013-06-01
We propose a unified framework for testing a variety of assumptions commonly made about the structure of copulas, including symmetry, radial symmetry, joint symmetry, associativity and Archimedeanity, and max-stability. Our test is nonparametric and based on the asymptotic distribution of the empirical copula process.We perform simulation experiments to evaluate our test and conclude that our method is reliable and powerful for assessing common assumptions on the structure of copulas, particularly when the sample size is moderately large. We illustrate our testing approach on two datasets. © 2013 American Statistical Association.
The maxBCG technique for finding galaxy clusters in SDSS data
Annis, J.; Kent, S.; Castander, F.; Eisenstein, D.; Gunn, J.; Kim, R.; Lupton, R.; Nichol, R.; Postman, M.; Voges, W.; SDSS Collaboration
1999-12-01
We present a new technique for finding galaxy clusters based on looking for a core of red, early type galaxies in the cluster center. These galaxies are known to have a small dispersion in color out to at least z=0.5. Further, the brightest of the ellipticals have near constant luminosity. In the maxBCG technique, one looks for objects whose appararent magnitudes and colors are consistent with their being brightest cluster galaxies (BCGs). If one presumes that any such object is a BCG, one can estimate a redshift and then search an area a half megaparsec around the galaxy for other galaxies that have the colors of the E/S0 ridgeline. One obtains a good estimate of the redshift by jointly minimizing the difference from the mean restframe brightest cluster galaxy properties while maximizing the number of galaxies in the E/S0 ridgeline. We have run this algoritm on the Abell clusters in the Sloan Digital Sky Survey commisioning data area with known redshifts, and find that the error in the estimated redshift z is only 0.02. This work was supported by the U.S. Department of Energy under contract No. DE-AC02-76CH03000.
Nonparametric Maximum Entropy Estimation on Information Diagrams
Martin, Elliot A; Meinke, Alexander; Děchtěrenko, Filip; Davidsen, Jörn
2016-01-01
Maximum entropy estimation is of broad interest for inferring properties of systems across many different disciplines. In this work, we significantly extend a technique we previously introduced for estimating the maximum entropy of a set of random discrete variables when conditioning on bivariate mutual informations and univariate entropies. Specifically, we show how to apply the concept to continuous random variables and vastly expand the types of information-theoretic quantities one can condition on. This allows us to establish a number of significant advantages of our approach over existing ones. Not only does our method perform favorably in the undersampled regime, where existing methods fail, but it also can be dramatically less computationally expensive as the cardinality of the variables increases. In addition, we propose a nonparametric formulation of connected informations and give an illustrative example showing how this agrees with the existing parametric formulation in cases of interest. We furthe...
Energy Technology Data Exchange (ETDEWEB)
Dong, Feng; Pierpaoli, Elena; Gunn, James E.; Wechsler, Risa H.
2007-10-29
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is {approx} 85% complete and over 90% pure for clusters with masses above 1.0 x 10{sup 14}h{sup -1} M and redshifts up to z = 0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity of {Delta} = 200, we find the derived cluster richness {Lambda}{sub 200} a roughly linear indicator of its virial mass M{sub 200}, which well recovers the relation between total luminosity and cluster mass of the input simulation.
A three-stage strategy for optimal price offering by a retailer based on clustering techniques
Energy Technology Data Exchange (ETDEWEB)
Mahmoudi-Kohan, N.; Shayesteh, E. [Islamic Azad University (Garmsar Branch), Garmsar (Iran); Moghaddam, M. Parsa; Sheikh-El-Eslami, M.K. [Tarbiat Modares University, Tehran (Iran)
2010-12-15
In this paper, an innovative strategy for optimal price offering to customers for maximizing the profit of a retailer is proposed. This strategy is based on load profile clustering techniques and includes three stages. For the purpose of clustering, an improved weighted fuzzy average K-means is proposed. Also, in this paper a new acceptance function for increasing the profit of the retailer is proposed. The new method is evaluated by implementation on a group of 300 customers of a 20 kV distribution network. (author)
Multiatlas segmentation as nonparametric regression.
Awate, Suyash P; Whitaker, Ross T
2014-09-01
This paper proposes a novel theoretical framework to model and analyze the statistical characteristics of a wide range of segmentation methods that incorporate a database of label maps or atlases; such methods are termed as label fusion or multiatlas segmentation. We model these multiatlas segmentation problems as nonparametric regression problems in the high-dimensional space of image patches. We analyze the nonparametric estimator's convergence behavior that characterizes expected segmentation error as a function of the size of the multiatlas database. We show that this error has an analytic form involving several parameters that are fundamental to the specific segmentation problem (determined by the chosen anatomical structure, imaging modality, registration algorithm, and label-fusion algorithm). We describe how to estimate these parameters and show that several human anatomical structures exhibit the trends modeled analytically. We use these parameter estimates to optimize the regression estimator. We show that the expected error for large database sizes is well predicted by models learned on small databases. Thus, a few expert segmentations can help predict the database sizes required to keep the expected error below a specified tolerance level. Such cost-benefit analysis is crucial for deploying clinical multiatlas segmentation systems.
Nonparametric statistics for social and behavioral sciences
Kraska-MIller, M
2013-01-01
Introduction to Research in Social and Behavioral SciencesBasic Principles of ResearchPlanning for ResearchTypes of Research Designs Sampling ProceduresValidity and Reliability of Measurement InstrumentsSteps of the Research Process Introduction to Nonparametric StatisticsData AnalysisOverview of Nonparametric Statistics and Parametric Statistics Overview of Parametric Statistics Overview of Nonparametric StatisticsImportance of Nonparametric MethodsMeasurement InstrumentsAnalysis of Data to Determine Association and Agreement Pearson Chi-Square Test of Association and IndependenceContingency
Semi-parametric regression: Efficiency gains from modeling the nonparametric part
Yu, Kyusang; Park, Byeong U; 10.3150/10-BEJ296
2011-01-01
It is widely admitted that structured nonparametric modeling that circumvents the curse of dimensionality is important in nonparametric estimation. In this paper we show that the same holds for semi-parametric estimation. We argue that estimation of the parametric component of a semi-parametric model can be improved essentially when more structure is put into the nonparametric part of the model. We illustrate this for the partially linear model, and investigate efficiency gains when the nonparametric part of the model has an additive structure. We present the semi-parametric Fisher information bound for estimating the parametric part of the partially linear additive model and provide semi-parametric efficient estimators for which we use a smooth backfitting technique to deal with the additive nonparametric part. We also present the finite sample performances of the proposed estimators and analyze Boston housing data as an illustration.
Szabo, Zoltan
2010-01-01
The goal of this paper is to extend independent subspace analysis (ISA) to the case of (i) nonparametric, not strictly stationary source dynamics and (ii) unknown source component dimensions. We make use of functional autoregressive (fAR) processes to model the temporal evolution of the hidden sources. An extension of the ISA separation principle--which states that the ISA problem can be solved by traditional independent component analysis (ICA) and clustering of the ICA elements--is derived for the solution of the defined fAR independent process analysis task (fAR-IPA): applying fAR identification we reduce the problem to ISA. A local averaging approach, the Nadaraya-Watson kernel regression technique is adapted to obtain strongly consistent fAR estimation. We extend the Amari-index to different dimensional components and illustrate the efficiency of the fAR-IPA approach by numerical examples.
Directory of Open Access Journals (Sweden)
Shiv Prasad Kori
2013-05-01
Full Text Available Wireless sensor network refers to a group of spatially distributed and dedicated sensors for monitoring and recording the physical conditions of environment like temperature, sound, pollution levels, humidity, wind speed with direction and pressure. Sensors are self powered nodes which also possess limited processing capabilities and the nodes communicate wirelessly through a gateway. The capability of sensing, processing and communication found in sensor networks lead to a vast number of applications of wireless sensor networks in areas such as environmental monitoring, warfare, education, agriculture to name a few. In the present work, the comparative evaluation of communication overhead for the wireless sensor network based of on clustering technique is carried out. It has been be observed that overhead in cluster based protocol is not much dependent upon update time. Simulation a result indicates that cluster based protocol has low communication overheads compared with the BBM based protocol when sink mobility is high
DEFF Research Database (Denmark)
Muhammad, Hanif; Juluri, Raghavendra R.; Chirumamilla, Manohar
2016-01-01
based on cluster beam technique allowing the formation of monocrystalline size-selected silver nanoparticles with a ±5–7% precision of diameter and controllable embedment into poly (methyl methacrylate). It is shown that the soft-landed silver clusters preserve almost spherical shape with a slight......An embedment of metal nanoparticles of well-defined sizes in thin polymer films is of significant interest for a number of practical applications, in particular, for preparing materials with tunable plasmonic properties. In this article, we present a fabrication route for metal–polymer composites...... tendency to flattening upon impact. By controlling the polymer hardness (from viscous to soft state) prior the cluster deposition and annealing conditions after the deposition the degree of immersion of the nanoparticles into polymer can be tuned, thus, making it possible to create composites with either...
Kassomenos, P.; Vardoulakis, S.; Borge, R.; Lumbreras, J.; Papaloukas, C.; Karakitsios, S.
2010-10-01
In this study, we used and compared three different statistical clustering methods: an hierarchical, a non-hierarchical (K-means) and an artificial neural network technique (self-organizing maps (SOM)). These classification methods were applied to a 4-year dataset of 5 days kinematic back trajectories of air masses arriving in Athens, Greece at 12.00 UTC, in three different heights, above the ground. The atmospheric back trajectories were simulated with the HYSPLIT Vesion 4.7 model of National Oceanic and Atmospheric Administration (NOAA). The meteorological data used for the computation of trajectories were obtained from NOAA reanalysis database. A comparison of the three statistical clustering methods through statistical indices was attempted. It was found that all three statistical methods seem to depend to the arrival height of the trajectories, but the degree of dependence differs substantially. Hierarchical clustering showed the highest level of dependence for fast-moving trajectories to the arrival height, followed by SOM. K-means was found to be the least depended clustering technique on the arrival height. The air quality management applications of these results in relation to PM10 concentrations recorded in Athens, Greece, were also discussed. Differences of PM10 concentrations, during certain clusters, were found statistically different (at 95% confidence level) indicating that these clusters appear to be associated with long-range transportation of particulates. This study can improve the interpretation of modelled atmospheric trajectories, leading to a more reliable analysis of synoptic weather circulation patterns and their impacts on urban air quality.
Directory of Open Access Journals (Sweden)
Mohiuddin Ahmed
2015-07-01
Full Text Available There is significant interest in the data mining and network management communities to efficiently analyse huge amounts of network traffic, given the amount of network traffic generated even in small networks. Summarization is a primary data mining task for generating a concise yet informative summary of the given data and it is a research challenge to create summary from network traffic data. Existing clustering based summarization techniques lack the ability to create a suitable summary for further data mining tasks such as anomaly detection and require the summary size as an external input. Additionally, for complex and high dimensional network traffic datasets, there is often no single clustering solution that explains the structure of the given data. In this paper, we investigate the use of multiview clustering to create a meaningful summary using original data instances from network traffic data in an efficient manner. We develop a mathematically sound approach to select the summary size using a sampling technique. We compare our proposed approach with regular clustering based summarization incorporating the summary size calculation method and random approach. We validate our proposed approach using the benchmark network traffic dataset and state-of-theart summary evaluation metrics.
Filtering techniques for the detection of Sunyaev-Zel'dovich clusters in multifrequency CMB maps
Herranz, D; Hobson, M P; Barreiro, R B; Diego-Rodriguez, J M; Martínez-González, E; Lasenby, A N
2002-01-01
The problem of detecting Sunyaev-Zel'dovich (SZ) clusters in multifrequency CMB observations is investigated using a number of filtering techniques. A multifilter approach is introduced, which optimizes the detection of SZ clusters on microwave maps. An alternative method is also investigated, in which maps at different frequencies are combined in an optimal manner so that existing filtering techniques can be applied to the single combined map. The SZ profiles are approximated by the circularly-symmetric template $\\tau (x) = [1 +(x/r_c)^2]^{-\\lambda}$, with $\\lambda \\simeq \\tfrac{1}{2}$ and $x\\equiv |\\vec{x}|$, where the core radius $r_c$ and the overall amplitude of the effect are not fixed a priori, but are determined from the data. The background emission is modelled by a homogeneous and isotropic random field, characterized by a cross-power spectrum $P_{\
Nonparametric Bayesian inference in biostatistics
Müller, Peter
2015-01-01
As chapters in this book demonstrate, BNP has important uses in clinical sciences and inference for issues like unknown partitions in genomics. Nonparametric Bayesian approaches (BNP) play an ever expanding role in biostatistical inference from use in proteomics to clinical trials. Many research problems involve an abundance of data and require flexible and complex probability models beyond the traditional parametric approaches. As this book's expert contributors show, BNP approaches can be the answer. Survival Analysis, in particular survival regression, has traditionally used BNP, but BNP's potential is now very broad. This applies to important tasks like arrangement of patients into clinically meaningful subpopulations and segmenting the genome into functionally distinct regions. This book is designed to both review and introduce application areas for BNP. While existing books provide theoretical foundations, this book connects theory to practice through engaging examples and research questions. Chapters c...
Nonparametric Regression with Common Shocks
Directory of Open Access Journals (Sweden)
Eduardo A. Souza-Rodrigues
2016-09-01
Full Text Available This paper considers a nonparametric regression model for cross-sectional data in the presence of common shocks. Common shocks are allowed to be very general in nature; they do not need to be finite dimensional with a known (small number of factors. I investigate the properties of the Nadaraya-Watson kernel estimator and determine how general the common shocks can be while still obtaining meaningful kernel estimates. Restrictions on the common shocks are necessary because kernel estimators typically manipulate conditional densities, and conditional densities do not necessarily exist in the present case. By appealing to disintegration theory, I provide sufficient conditions for the existence of such conditional densities and show that the estimator converges in probability to the Kolmogorov conditional expectation given the sigma-field generated by the common shocks. I also establish the rate of convergence and the asymptotic distribution of the kernel estimator.
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters
2012-12-01
are suited for threaded (parallel) execution, by labeling them as kernels using syntax specified by the GPU programming language (e.g., CUDA for an...Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi- GPU Clusters Eric Hayden, Mark Schmalz, William Chapman, Sanjay...Abstract - This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using multiple Graphics Processing Units ( GPUs ). Our
Nonparametric Bayesian Modeling of Complex Networks
DEFF Research Database (Denmark)
Schmidt, Mikkel Nørgaard; Mørup, Morten
2013-01-01
Modeling structure in complex networks using Bayesian nonparametrics makes it possible to specify flexible model structures and infer the adequate model complexity from the observed data. This article provides a gentle introduction to nonparametric Bayesian modeling of complex networks: Using...... for complex networks can be derived and point out relevant literature....
An asymptotically optimal nonparametric adaptive controller
Institute of Scientific and Technical Information of China (English)
郭雷; 谢亮亮
2000-01-01
For discrete-time nonlinear stochastic systems with unknown nonparametric structure, a kernel estimation-based nonparametric adaptive controller is constructed based on truncated certainty equivalence principle. Global stability and asymptotic optimality of the closed-loop systems are established without resorting to any external excitations.
Dong, Feng; Gunn, James E; Wechsler, Risa H
2007-01-01
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is ~85% complete and over 90% pure for clusters with masses above 1.0*10^{14} h^{-1} M_solar and redshifts up to z=0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensi...
Sokolov, Anton; Dmitriev, Egor; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmenten, Marc
2016-04-01
The problem of atmospheric contamination by principal air pollutants was considered in the industrialized coastal region of English Channel in Dunkirk influenced by north European metropolitan areas. MESO-NH nested models were used for the simulation of the local atmospheric dynamics and the online calculation of Lagrangian backward trajectories with 15-minute temporal resolution and the horizontal resolution down to 500 m. The one-month mesoscale numerical simulation was coupled with local pollution measurements of volatile organic components, particulate matter, ozone, sulphur dioxide and nitrogen oxides. Principal atmospheric pathways were determined by clustering technique applied to backward trajectories simulated. Six clusters were obtained which describe local atmospheric dynamics, four winds blowing through the English Channel, one coming from the south, and the biggest cluster with small wind speeds. This last cluster includes mostly sea breeze events. The analysis of meteorological data and pollution measurements allows relating the principal atmospheric pathways with local air contamination events. It was shown that contamination events are mostly connected with a channelling of pollution from local sources and low-turbulent states of the local atmosphere.
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.
1985-01-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
Energy Technology Data Exchange (ETDEWEB)
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.
1985-08-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references.
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: The process of network planning is divided into two sub steps. The first step is determining the location of the Multi Service Access Node (MSAN. The second step is the construction of subscriber network lines from MSAN to subscribers to satisfy optimization criteria and design constraints. Due to the complexity of this process artificial intelligence and clustering techniques have been successfully deployed to solve many problems. The problems of the locations of MSAN, the cabling layout and the computation of optimum cable network layouts have been addressed in this study. The proposed algorithm, Clustering density-Based Spatial of Applications with Noise original, minimal Spanning tree and modified Ant-Colony-Based algorithm (CBSCAN-SPANT, used two clustering algorithms which are density-based and agglomerative clustering algorithm using distances which are shortest paths distance and satisfying the network constraints. This algorithm used wire and wireless technology to serve the subscribers demand and place the switches in a real optimal place. Approach: The density-based Spatial Clustering of Applications with Noise original (DBSCAN algorithm has been modified and a new algorithm (NetPlan algorithm has been proposed by the author in a recent work to solve the first step in the problem of network planning. In the present study, the NetPlan algorithm is modified by introduce the modified Ant-Colony-Based algorithm to find the optimal path between any node and the corresponding MSAN node in the first step of network planning process to determine nodes belonging to each cluster. The second step, in the process of network planning, is also introduced in the present study. For each cluster, the optimal cabling layout from each MSAN to the subscriber premises is determining by introduce the Prime algorithm which construct minimal spanning tree. Results: Experimental results and analysis indicate that the
A Survey On: Content Based Image Retrieval Systems Using Clustering Techniques For Large Data sets
Directory of Open Access Journals (Sweden)
Monika Jain
2011-12-01
Full Text Available Content-based image retrieval (CBIR is a new but widely adopted method for finding images from vastand unannotated image databases. As the network and development of multimedia technologies arebecoming more popular, users are not satisfied with the traditional information retrieval techniques. Sonowadays the content based image retrieval (CBIR are becoming a source of exact and fast retrieval. Inrecent years, a variety of techniques have been developed to improve the performance of CBIR. Dataclustering is an unsupervised method for extraction hidden pattern from huge data sets. With large datasets, there is possibility of high dimensionality. Having both accuracy and efficiency for high dimensionaldata sets with enormous number of samples is a challenging arena. In this paper the clustering techniquesare discussed and analysed. Also, we propose a method HDK that uses more than one clustering techniqueto improve the performance of CBIR.This method makes use of hierachical and divide and conquer KMeansclustering technique with equivalency and compatible relation concepts to improve the performanceof the K-Means for using in high dimensional datasets. It also introduced the feature like color, texture andshape for accurate and effective retrieval system.
Directory of Open Access Journals (Sweden)
A. Meenakshi
2016-08-01
Full Text Available Resource allocation is the task of convenient resources to different uses. In the context of an resources, entire economy, can be assigned by different means, such as markets or central planning. Cloud computing has become a new age technology that has got huge potentials in enterprises and markets. Clouds can make it possible to access applications and associated data from anywhere. The fundamental motive of the resource allocation is to allot the available resource in the most effective manner. In the initial phase, a representative resource usage distribution for a group of nodes with identical resource usage patterns is evaluated as resource bundle which can be easily employed to locate a group of nodes fulfilling a standard criterion. In the document, an innovative clustering-based resource aggregation viz. the Improved Hierarchal Agglomerative Clustering Algorithm (IHAC is elegantly launched to realize the compact illustration of a set of identically behaving nodes for scalability. In the subsequent phase concerned with energetic resource allocation procedure, the hybrid optimization technique is brilliantly brought in. The novel technique is devised for scheduling functions to cloud resources which duly consider both financial and evaluation expenses. The efficiency of the novel Resource allocation system is assessed by means of several parameters such the reliability, reusability and certain other metrics. The optimal path choice is the consequence of the hybrid optimization approach. The new-fangled technique allocates the available resource based on the optimal path.
Combined parametric-nonparametric identification of block-oriented systems
Mzyk, Grzegorz
2014-01-01
This book considers a problem of block-oriented nonlinear dynamic system identification in the presence of random disturbances. This class of systems includes various interconnections of linear dynamic blocks and static nonlinear elements, e.g., Hammerstein system, Wiener system, Wiener-Hammerstein ("sandwich") system and additive NARMAX systems with feedback. Interconnecting signals are not accessible for measurement. The combined parametric-nonparametric algorithms, proposed in the book, can be selected dependently on the prior knowledge of the system and signals. Most of them are based on the decomposition of the complex system identification task into simpler local sub-problems by using non-parametric (kernel or orthogonal) regression estimation. In the parametric stage, the generalized least squares or the instrumental variables technique is commonly applied to cope with correlated excitations. Limit properties of the algorithms have been shown analytically and illustrated in simple experiments.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
-Douglas function nor the Translog function are consistent with the “true” relationship between the inputs and the output in our data set. We solve this problem by using non-parametric regression. This approach delivers reasonable results, which are on average not too different from the results of the parametric......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
by investigating the relationship between the elasticity of scale and the farm size. We use a balanced panel data set of 371~specialised crop farms for the years 2004-2007. A non-parametric specification test shows that neither the Cobb-Douglas function nor the Translog function are consistent with the "true......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...
Parametric and Non-Parametric System Modelling
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg
1999-01-01
considered. It is shown that adaptive estimation in conditional parametric models can be performed by combining the well known methods of local polynomial regression and recursive least squares with exponential forgetting. The approach used for estimation in conditional parametric models also highlights how....... For this purpose non-parametric methods together with additive models are suggested. Also, a new approach specifically designed to detect non-linearities is introduced. Confidence intervals are constructed by use of bootstrapping. As a link between non-parametric and parametric methods a paper dealing with neural...... the focus is on combinations of parametric and non-parametric methods of regression. This combination can be in terms of additive models where e.g. one or more non-parametric term is added to a linear regression model. It can also be in terms of conditional parametric models where the coefficients...
Samala, Ravi K.; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.
2016-10-01
With IRB approval, digital breast tomosynthesis (DBT) images of human subjects were collected using a GE GEN2 DBT prototype system. Corresponding digital mammograms (DMs) of the same subjects were collected retrospectively from patient files. The data set contained a total of 237 views of DBT and equal number of DM views from 120 human subjects, each included 163 views with microcalcification clusters (MCs) and 74 views without MCs. The data set was separated into training and independent test sets. The pre-processing, object prescreening and segmentation, false positive reduction and clustering strategies for MC detection by three computer-aided detection (CADe) systems designed for DM, DBT, and a planar projection image generated from DBT were analyzed. Receiver operating characteristic (ROC) curves based on features extracted from microcalcifications and free-response ROC (FROC) curves based on scores from MCs were used to quantify the performance of the systems. Jackknife FROC (JAFROC) and non-parametric analysis methods were used to determine the statistical difference between the FROC curves. The difference between the CADDM and CADDBT systems when the false positive rate was estimated from cases without MCs did not reach statistical significance. The study indicates that the large search space in DBT may not be a limiting factor for CADe to achieve similar performance as that observed in DM.
Bayesian nonparametric duration model with censorship
Directory of Open Access Journals (Sweden)
Joseph Hakizamungu
2007-10-01
Full Text Available This paper is concerned with nonparametric i.i.d. durations models censored observations and we establish by a simple and unified approach the general structure of a bayesian nonparametric estimator for a survival function S. For Dirichlet prior distributions, we describe completely the structure of the posterior distribution of the survival function. These results are essentially supported by prior and posterior independence properties.
Non-parametric Bayesian graph models reveal community structure in resting state fMRI
DEFF Research Database (Denmark)
Andersen, Kasper Winther; Madsen, Kristoffer H.; Siebner, Hartwig Roman
2014-01-01
Modeling of resting state functional magnetic resonance imaging (rs-fMRI) data using network models is of increasing interest. It is often desirable to group nodes into clusters to interpret the communication patterns between nodes. In this study we consider three different nonparametric Bayesian...
Comparative assessment of bone pose estimation using Point Cluster Technique and OpenSim.
Lathrop, Rebecca L; Chaudhari, Ajit M W; Siston, Robert A
2011-11-01
Estimating the position of the bones from optical motion capture data is a challenge associated with human movement analysis. Bone pose estimation techniques such as the Point Cluster Technique (PCT) and simulations of movement through software packages such as OpenSim are used to minimize soft tissue artifact and estimate skeletal position; however, using different methods for analysis may produce differing kinematic results which could lead to differences in clinical interpretation such as a misclassification of normal or pathological gait. This study evaluated the differences present in knee joint kinematics as a result of calculating joint angles using various techniques. We calculated knee joint kinematics from experimental gait data using the standard PCT, the least squares approach in OpenSim applied to experimental marker data, and the least squares approach in OpenSim applied to the results of the PCT algorithm. Maximum and resultant RMS differences in knee angles were calculated between all techniques. We observed differences in flexion/extension, varus/valgus, and internal/external rotation angles between all approaches. The largest differences were between the PCT results and all results calculated using OpenSim. The RMS differences averaged nearly 5° for flexion/extension angles with maximum differences exceeding 15°. Average RMS differences were relatively small (techniques appeared to be a constant offset between the PCT and all OpenSim results, which may be due to differences in the definition of anatomical reference frames, scaling of musculoskeletal models, and/or placement of virtual markers within OpenSim. Different methods for data analysis can produce largely different kinematic results, which could lead to the misclassification of normal or pathological gait. Improved techniques to allow non-uniform scaling of generic models to more accurately reflect subject-specific bone geometries and anatomical reference frames may reduce differences
Nonparametric Monitoring for Geotechnical Structures Subject to Long-Term Environmental Change
Directory of Open Access Journals (Sweden)
Hae-Bum Yun
2011-01-01
Full Text Available A nonparametric, data-driven methodology of monitoring for geotechnical structures subject to long-term environmental change is discussed. Avoiding physical assumptions or excessive simplification of the monitored structures, the nonparametric monitoring methodology presented in this paper provides reliable performance-related information particularly when the collection of sensor data is limited. For the validation of the nonparametric methodology, a field case study was performed using a full-scale retaining wall, which had been monitored for three years using three tilt gauges. Using the very limited sensor data, it is demonstrated that important performance-related information, such as drainage performance and sensor damage, could be disentangled from significant daily, seasonal and multiyear environmental variations. Extensive literature review on recent developments of parametric and nonparametric data processing techniques for geotechnical applications is also presented.
Directory of Open Access Journals (Sweden)
Mr. P. Kanagaraju. Me, (Ph. D
2014-03-01
Full Text Available The Elliptic curve cryptography ( ECC a promising and important because it requires less computing power, bandwidth, and also the memory when comparing to other cryptosystems The clustering algorithm using the Integer Linear Programming (ILP and Boolean Satisfiability (SAT solvers. These improvements will secure the application of SAT and ILP techniques in modeling composite engineering problem that is the Clustering Problem in Mobile Ad-Hoc Networks (MANETs. The Clustering Problem in MANETs consists of selecting the most appropriate nodes of a given MANET topology as clusterheads, and ensuring that regular nodes are related to clusterheads such that the lifetime of the network is maximized. In which, discussing SAT/ILP techniques for clustering techniques and ECC El Gamal Threshold Cryptography for the security. Through our implementation, explored the possibility of using ECCEG-TC in MANETs.
CLUSTERING BASED ADAPTIVE IMAGE COMPRESSION SCHEME USING PARTICLE SWARM OPTIMIZATION TECHNIQUE
Directory of Open Access Journals (Sweden)
M.Mohamed Ismail,
2010-10-01
Full Text Available This paper presents an image compression scheme with particle swarm optimization technique for clustering. The PSO technique is a powerful general purpose optimization technique that uses the concept of fitness.It provides a mechanism such that individuals in the swarm communicate and exchange information which is similar to the social behaviour of insects & human beings. Because of the mimicking the social sharing of information ,PSO directs particle to search the solution more efficiently.PSO is like a GA in that the population isinitialized with random potential solutions.The adjustment towards the best individual experience (PBEST and the best social experience (GBEST.Is conceptually similar to the cross over operaton of the GA.However it is unlike a GA in that each potential solution , called a particle is flying through the solution space with a velocity.Moreover the particles and the swarm have memory,which does not exist in the populatiom of GA.This optimization technique is used in Image compression and better results have obtained in terms of PSNR, CR and the visual quality of the image when compared to other existing methods.
Clustering technique-based least square support vector machine for EEG signal classification.
Siuly; Li, Yan; Wen, Peng Paul
2011-12-01
This paper presents a new approach called clustering technique-based least square support vector machine (CT-LS-SVM) for the classification of EEG signals. Decision making is performed in two stages. In the first stage, clustering technique (CT) has been used to extract representative features of EEG data. In the second stage, least square support vector machine (LS-SVM) is applied to the extracted features to classify two-class EEG signals. To demonstrate the effectiveness of the proposed method, several experiments have been conducted on three publicly available benchmark databases, one for epileptic EEG data, one for mental imagery tasks EEG data and another one for motor imagery EEG data. Our proposed approach achieves an average sensitivity, specificity and classification accuracy of 94.92%, 93.44% and 94.18%, respectively, for the epileptic EEG data; 83.98%, 84.37% and 84.17% respectively, for the motor imagery EEG data; and 64.61%, 58.77% and 61.69%, respectively, for the mental imagery tasks EEG data. The performance of the CT-LS-SVM algorithm is compared in terms of classification accuracy and execution (running) time with our previous study where simple random sampling with a least square support vector machine (SRS-LS-SVM) was employed for EEG signal classification. We also compare the proposed method with other existing methods in the literature for the three databases. The experimental results show that the proposed algorithm can produce a better classification rate than the previous reported methods and takes much less execution time compared to the SRS-LS-SVM technique. The research findings in this paper indicate that the proposed approach is very efficient for classification of two-class EEG signals.
Colucci, J E; McWilliam, A
2016-01-01
We present abundances of globular clusters in the Milky Way and Fornax from integrated light spectra. Our goal is to evaluate the consistency of the integrated light analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of 7 clusters from our previous publications and results for 5 new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from integrated light spectra agrees to $\\sim$0.1 dex for globular clusters with metallicities as high as [Fe/H]=$-0.3$, but the abundances measured for more metal rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the integrated light analysis gives results that are most similar to analysis of individual stellar ...
Formation of ordered CoAl alloy clusters by the plasma-gas condensation technique
Toyohiko J., Konno; Saeki, Yamamuro; Kenji, Sumiyama
2001-01-01
CoxAl1-x alloy clusters were synthesized from a mixture of Co and Al metal vapors generated by the sputtering of pure metal targets. We observed that the produced alloy clusters were uniform in size, ranging from approximately 20 nm for Al-rich clusters to 10 nm for Co-rich clusters. For a wide average composition range (x?0.4-0.7), the alloy clusters have the ordered B2 (CsCl-type) structure. In the Co-rich cluster aggregates (x=0.76), the clusters are composed of face-centered-cubic (fcc) C...
Formation of ordered CoAl alloy clusters by the plasma-gas condensation technique
Toyohiko J., Konno; Saeki, Yamamuro; Kenji, Sumiyama
2001-01-01
CoxAl1-x alloy clusters were synthesized from a mixture of Co and Al metal vapors generated by the sputtering of pure metal targets. We observed that the produced alloy clusters were uniform in size, ranging from approximately 20 nm for Al-rich clusters to 10 nm for Co-rich clusters. For a wide average composition range (x?0.4-0.7), the alloy clusters have the ordered B2 (CsCl-type) structure. In the Co-rich cluster aggregates (x=0.76), the clusters are composed of face-centered-cubic (fcc) C...
Energy Technology Data Exchange (ETDEWEB)
Feller, David, E-mail: dfeller@owt.com; Peterson, Kirk A. [Department of Chemistry, Washington State University, Pullman, Washington 99164-4630 (United States); Davidson, Ernest R. [Department of Chemistry, University of Washington, Seattle, Washington 98195-1700 (United States)
2014-09-14
A systematic sequence of configuration interaction and coupled cluster calculations were used to describe selected low-lying singlet and triplet vertically excited states of ethylene with the goal of approaching the all electron, full configuration interaction/complete basis set limit. Included among these is the notoriously difficult, mixed valence/Rydberg {sup 1}B{sub 1u} V state. Techniques included complete active space and iterative natural orbital configuration interaction with large reference spaces which led to variational spaces of 1.8 × 10{sup 9} parameters. Care was taken to avoid unintentionally biasing the results due to the widely recognized sensitivity of the V state to the details of the calculation. The lowest vertical and adiabatic ionization potentials to the {sup 2}B{sub 3u} and {sup 2}B{sub 3} states were also determined. In addition, the heat of formation of twisted ethylene {sup 3}A{sub 1} was obtained from large basis set coupled cluster theory calculations including corrections for core/valence, scalar relativistic and higher order correlation recovery.
GPU peer-to-peer techniques applied to a cluster interconnect
Ammendola, Roberto; Biagioni, Andrea; Bisson, Mauro; Fatica, Massimiliano; Frezza, Ottorino; Cicero, Francesca Lo; Lonardo, Alessandro; Mastrostefano, Enrico; Paolucci, Pier Stanislao; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, Piero
2013-01-01
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major......This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... currencies. Results show the nonparametric model generally dominates the others when evaluating in-sample. However, the semiparametric model is best for out-of-sample analysis....
Recent Advances and Trends in Nonparametric Statistics
Akritas, MG
2003-01-01
The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection o
Correlated Non-Parametric Latent Feature Models
Doshi-Velez, Finale
2012-01-01
We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.
A Censored Nonparametric Software Reliability Model
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper analyses the effct of censoring on the estimation of failure rate, and presents a framework of a censored nonparametric software reliability model. The model is based on nonparametric testing of failure rate monotonically decreasing and weighted kernel failure rate estimation under the constraint of failure rate monotonically decreasing. Not only does the model have the advantages of little assumptions and weak constraints, but also the residual defects number of the software system can be estimated. The numerical experiment and real data analysis show that the model performs well with censored data.
Nonparametric correlation models for portfolio allocation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Casas, Isabel
2013-01-01
This article proposes time-varying nonparametric and semiparametric estimators of the conditional cross-correlation matrix in the context of portfolio allocation. Simulations results show that the nonparametric and semiparametric models are best in DGPs with substantial variability or structural...... breaks in correlations. Only when correlations are constant does the parametric DCC model deliver the best outcome. The methodologies are illustrated by evaluating two interesting portfolios. The first portfolio consists of the equity sector SPDRs and the S&P 500, while the second one contains major...
Directory of Open Access Journals (Sweden)
Ajayi Abiola Toyin
2013-01-01
Full Text Available The genetic variability among 10 accessions of cowpea, Vigna unguiculata (L. Walp was studied by the use of 13 qualitative and 13 quantitative traits. From the results on qualitative traits, dendrogram grouped the 10 accessions into two major clusters, 1 and 2.Cluster 1 had 3 accessions and cluster 2 had 2 sub-clusters (I and II, having 2 accessions in sub-cluster I and 5 accessions in sub-cluster II. The dendrogram revealed two major clusters, 1 and 2, for quantitative data, for the 10 accessions. At distance of 4 and 6, cluster 1 had two sub-clusters (I and II, with sub-cluster I having 5 accessions, sub-cluster II having 4 accessions while cluster 2 had only 1 accession. This study made the observation that identification of the right agro-morphological traits of high discriminating capacity is essential, before embarking on any genetic diversity; as it was revealed that some traits discriminated more efficiently among the accessions than others. A group of accessions, which are NGSA1, NGSA2, NGSA3, NGSA4, NGSA7, NGSA9 and NGSA10, was identified as being different from the others for number of seeds per pod, pod length, plant height, peduncle length, seed weight and number of pods per plant. These accessions may be good for cowpea improvement programs.
Formation of ordered CoAl alloy clusters by the plasma-gas condensation technique
Konno, Toyohiko J.; Yamamuro, Saeki; Sumiyama, Kenji
2001-09-01
CoxAl1-x alloy clusters were synthesized from a mixture of Co and Al metal vapors generated by the sputtering of pure metal targets. We observed that the produced alloy clusters were uniform in size, ranging from approximately 20 nm for Al-rich clusters to 10 nm for Co-rich clusters. For a wide average composition range (x≈0.4-0.7), the alloy clusters have the ordered B2 (CsCl-type) structure. In the Co-rich cluster aggregates (x=0.76), the clusters are composed of face-centered-cubic (fcc) Co and minor CoAl(B2) clusters. In the Al-rich aggregates (x=0.23), the clusters are mainly composed of the fcc-Al phase, although clusters occasionally possess a "core-shell structure" with the CoAl(B2) phase surrounded by an Al-rich amorphous phase. These observations are in general agreement with our prediction based on the equilibrium phase diagram. We also noticed that the average composition depends not only on the relative amount of Co and Al vapors, but also on their absolute amount, and even on the Ar gas flow rate, which promotes mixing and cooling the two vapors. These findings show that the formation of alloy clusters in vapor phase is strongly influenced by the kinetics of cluster formation, and is a competing process between the approach to equilibrium and the quenching of the whole system.
Electric field measurements on Cluster: comparing the double-probe and electron drift techniques
Directory of Open Access Journals (Sweden)
A. I. Eriksson
2006-03-01
Full Text Available The four Cluster satellites each carry two instruments designed for measuring the electric field: a double-probe instrument (EFW and an electron drift instrument (EDI. We compare data from the two instruments in a representative sample of plasma regions. The complementary merits and weaknesses of the two techniques are illustrated. EDI operations are confined to regions of magnetic fields above 30 nT and where wave activity and keV electron fluxes are not too high, while EFW can provide data everywhere, and can go far higher in sampling frequency than EDI. On the other hand, the EDI technique is immune to variations in the low energy plasma, while EFW sometimes detects significant nongeophysical electric fields, particularly in regions with drifting plasma, with ion energy (in eV below the spacecraft potential (in volts. We show that the polar cap is a particularly intricate region for the double-probe technique, where large nongeophysical fields regularly contaminate EFW measurments of the DC electric field. We present a model explaining this in terms of enhanced cold plasma wake effects appearing when the ion flow energy is higher than the thermal energy but below the spacecraft potential multiplied by the ion charge. We suggest that these conditions, which are typical of the polar wind and occur sporadically in other regions containing a significant low energy ion population, cause a large cold plasma wake behind the spacecraft, resulting in spurious electric fields in EFW data. This interpretation is supported by an analysis of the direction of the spurious electric field, and by showing that use of active potential control alleviates the situation.
DCT-Yager FNN: a novel Yager-based fuzzy neural network with the discrete clustering technique.
Singh, A; Quek, C; Cho, S Y
2008-04-01
Earlier clustering techniques such as the modified learning vector quantization (MLVQ) and the fuzzy Kohonen partitioning (FKP) techniques have focused on the derivation of a certain set of parameters so as to define the fuzzy sets in terms of an algebraic function. The fuzzy membership functions thus generated are uniform, normal, and convex. Since any irregular training data is clustered into uniform fuzzy sets (Gaussian, triangular, or trapezoidal), the clustering may not be exact and some amount of information may be lost. In this paper, two clustering techniques using a Kohonen-like self-organizing neural network architecture, namely, the unsupervised discrete clustering technique (UDCT) and the supervised discrete clustering technique (SDCT), are proposed. The UDCT and SDCT algorithms reduce this data loss by introducing nonuniform, normal fuzzy sets that are not necessarily convex. The training data range is divided into discrete points at equal intervals, and the membership value corresponding to each discrete point is generated. Hence, the fuzzy sets obtained contain pairs of values, each pair corresponding to a discrete point and its membership grade. Thus, it can be argued that fuzzy membership functions generated using this kind of a discrete methodology provide a more accurate representation of the actual input data. This fact has been demonstrated by comparing the membership functions generated by the UDCT and SDCT algorithms against those generated by the MLVQ, FKP, and pseudofuzzy Kohonen partitioning (PFKP) algorithms. In addition to these clustering techniques, a novel pattern classifying network called the Yager fuzzy neural network (FNN) is proposed in this paper. This network corresponds completely to the Yager inference rule and exhibits remarkable generalization abilities. A modified version of the pseudo-outer product (POP)-Yager FNN called the modified Yager FNN is introduced that eliminates the drawbacks of the earlier network and yi- elds
Thirty years of nonparametric item response theory
Molenaar, W.
2001-01-01
Relationships between a mathematical measurement model and its real-world applications are discussed. A distinction is made between large data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Nonparametric methods are evaluated fo
A Bayesian Nonparametric Approach to Test Equating
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the trea
Decompounding random sums: A nonparametric approach
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted; Pitts, Susan M.
review a number of applications and consider the nonlinear inverse problem of inferring the cumulative distribution function of the components in the random sum. We review the existing literature on non-parametric approaches to the problem. The models amenable to the analysis are generalized considerably...
Nonparametric confidence intervals for monotone functions
Groeneboom, P.; Jongbloed, G.
2015-01-01
We study nonparametric isotonic confidence intervals for monotone functions. In [Ann. Statist. 29 (2001) 1699–1731], pointwise confidence intervals, based on likelihood ratio tests using the restricted and unrestricted MLE in the current status model, are introduced. We extend the method to the
A Nonparametric Analogy of Analysis of Covariance
Burnett, Thomas D.; Barr, Donald R.
1977-01-01
A nonparametric test of the hypothesis of no treatment effect is suggested for a situation where measures of the severity of the condition treated can be obtained and ranked both pre- and post-treatment. The test allows the pre-treatment rank to be used as a concomitant variable. (Author/JKS)
Panel data specifications in nonparametric kernel regression
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...
How Are Teachers Teaching? A Nonparametric Approach
De Witte, Kristof; Van Klaveren, Chris
2014-01-01
This paper examines which configuration of teaching activities maximizes student performance. For this purpose a nonparametric efficiency model is formulated that accounts for (1) self-selection of students and teachers in better schools and (2) complementary teaching activities. The analysis distinguishes both individual teaching (i.e., a…
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-01
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = ‑0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
Araghi, Houshang; Zabihi, Zabiholah; Nayebi, Payman; Ehsani, Mohammad Mahdi
2016-10-01
II-VI semiconductor CdTe was grown on the Si(100) substrate surface by the ionized cluster beam (ICB) technique. In the ICB method, when vapors of solid materials such as CdTe were ejected through a nozzle of a heated crucible into a vacuum region, nanoclusters were created by an adiabatic expansion phenomenon. The clusters thus obtained were partially ionized by electron bombardment and then accelerated onto the silicon substrate at 473 K by high potentials. The cluster size was determined using a retarding field energy analyzer. The results of X-ray diffraction measurements indicate the cubic zinc blende (ZB) crystalline structure of the CdTe thin film on the silicon substrate. The CdTe thin film prepared by the ICB method had high crystalline quality. The microscopic processes involved in the ICB deposition technique, such as impact and coalescence processes, have been studied in detail by molecular dynamics (MD) simulation.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
2012-01-01
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb-Douglas a......Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify a functional form of the production function of which the Cobb...... parameter estimates, but also in biased measures which are derived from the parameters, such as elasticities. Therefore, we propose to use non-parametric econometric methods. First, these can be applied to verify the functional form used in parametric production analysis. Second, they can be directly used...... to estimate production functions without the specification of a functional form. Therefore, they avoid possible misspecification errors due to the use of an unsuitable functional form. In this paper, we use parametric and non-parametric methods to identify the optimal size of Polish crop farms...
Colour image segmentation using unsupervised clustering technique for acute leukemia images
Halim, N. H. Abd; Mashor, M. Y.; Nasir, A. S. Abdul; Mustafa, N.; Hassan, R.
2015-05-01
Colour image segmentation has becoming more popular for computer vision due to its important process in most medical analysis tasks. This paper proposes comparison between different colour components of RGB(red, green, blue) and HSI (hue, saturation, intensity) colour models that will be used in order to segment the acute leukemia images. First, partial contrast stretching is applied on leukemia images to increase the visual aspect of the blast cells. Then, an unsupervised moving k-means clustering algorithm is applied on the various colour components of RGB and HSI colour models for the purpose of segmentation of blast cells from the red blood cells and background regions in leukemia image. Different colour components of RGB and HSI colour models have been analyzed in order to identify the colour component that can give the good segmentation performance. The segmented images are then processed using median filter and region growing technique to reduce noise and smooth the images. The results show that segmentation using saturation component of HSI colour model has proven to be the best in segmenting nucleus of the blast cells in acute leukemia image as compared to the other colour components of RGB and HSI colour models.
Directory of Open Access Journals (Sweden)
Wangren Qiu
2015-01-01
Full Text Available In view of techniques for constructing high-order fuzzy time series models, there are three types which are based on advanced algorithms, computational method, and grouping the fuzzy logical relationships. The last type of models is easy to be understood by the decision maker who does not know anything about fuzzy set theory or advanced algorithms. To deal with forecasting problems, this paper presented novel high-order fuzz time series models denoted as GTS (M, N based on generalized fuzzy logical relationships and automatic clustering. This paper issued the concept of generalized fuzzy logical relationship and an operation for combining the generalized relationships. Then, the procedure of the proposed model was implemented on forecasting enrollment data at the University of Alabama. To show the considerable outperforming results, the proposed approach was also applied to forecasting the Shanghai Stock Exchange Composite Index. Finally, the effects of parameters M and N, the number of order, and concerned principal fuzzy logical relationships, on the forecasting results were also discussed.
A Meliorate Routing of Reactive Protocol with Clustering Technique in MANET
Directory of Open Access Journals (Sweden)
Zainab Khandsakarwala,
2014-05-01
Full Text Available The field of Mobile Adhoc Network (MANET has become very popular because of the deep research done in that area in last few years. MANET has advantage of operating without fixed infrastructure and also it can tolerate many changes in the network topology. The MANET uses different routing protocols for End to End Packet delivery. This paper is subjected to the Reactive routing protocols on the basis of identical environment conditions and evaluates their relative performance with respect to the performance metric Packet delivery ratio, overhead & throughput. In this Reactive routing protocols can spectacularly reduce routing overhead because they do not need to search for and maintain the routes on which there is no data traffic. This property is very invoking in the limited resource. Achieve a good efficient network life and reliability need a variation on the notion of multicasting. Geo-casting is useful for sending messages to nodes in a specified geographical region. This region is called the geo-cast region. For geo-casting in mobile ad hoc networks. The proposed protocol combines any casting with local flooding to implement geo-casting. Thus, Protocol requires two phases for geo-casting. First, it performs any casting from a source to any node in the geo-cast region. Also this Protocol works on large MANET, and to achieve high accuracy and optimize output. To perform geo-cast region we use a proposed clustering technique in Large MANET.
Clustering analysis of seismicity and aftershock identification.
Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry
2008-07-01
We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.
Stephenson, Leigh T; Moody, Michael P; Liddicoat, Peter V; Ringer, Simon P
2007-12-01
Nanoscale atomic clusters in atom probe tomographic data are not universally defined but instead are characterized by the clustering algorithm used and the parameter values controlling the algorithmic process. A new core-linkage clustering algorithm is developed, combining fundamental elements of the conventional maximum separation method with density-based analyses. A key improvement to the algorithm is the independence of algorithmic parameters inherently unified in previous techniques, enabling a more accurate analysis to be applied across a wider range of material systems. Further, an objective procedure for the selection of parameters based on approximating the data with a model of complete spatial randomness is developed and applied. The use of higher nearest neighbor distributions is highlighted to give insight into the nature of the clustering phenomena present in a system and to generalize the clustering algorithms used to analyze it. Maximum separation, density-based scanning, and the core linkage algorithm, developed within this study, were separately applied to the investigation of fine solute clustering of solute atoms in an Al-1.9Zn-1.7Mg (at.%) at two distinct states of early phase decomposition and the results of these analyses were evaluated.
Nonparametric tests for pathwise properties of semimartingales
Cont, Rama; 10.3150/10-BEJ293
2011-01-01
We propose two nonparametric tests for investigating the pathwise properties of a signal modeled as the sum of a L\\'{e}vy process and a Brownian semimartingale. Using a nonparametric threshold estimator for the continuous component of the quadratic variation, we design a test for the presence of a continuous martingale component in the process and a test for establishing whether the jumps have finite or infinite variation, based on observations on a discrete-time grid. We evaluate the performance of our tests using simulations of various stochastic models and use the tests to investigate the fine structure of the DM/USD exchange rate fluctuations and SPX futures prices. In both cases, our tests reveal the presence of a non-zero Brownian component and a finite variation jump component.
Nonparametric Transient Classification using Adaptive Wavelets
Varughese, Melvin M; Stephanou, Michael; Bassett, Bruce A
2015-01-01
Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind ...
A Bayesian nonparametric meta-analysis model.
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G
2015-03-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall effect size, such models may be adequate, but for prediction, they surely are not if the effect-size distribution exhibits non-normal behavior. To address this issue, we propose a Bayesian nonparametric meta-analysis model, which can describe a wider range of effect-size distributions, including unimodal symmetric distributions, as well as skewed and more multimodal distributions. We demonstrate our model through the analysis of real meta-analytic data arising from behavioral-genetic research. We compare the predictive performance of the Bayesian nonparametric model against various conventional and more modern normal fixed-effects and random-effects models.
Nonparametric Bayes analysis of social science data
Kunihama, Tsuyoshi
Social science data often contain complex characteristics that standard statistical methods fail to capture. Social surveys assign many questions to respondents, which often consist of mixed-scale variables. Each of the variables can follow a complex distribution outside parametric families and associations among variables may have more complicated structures than standard linear dependence. Therefore, it is not straightforward to develop a statistical model which can approximate structures well in the social science data. In addition, many social surveys have collected data over time and therefore we need to incorporate dynamic dependence into the models. Also, it is standard to observe massive number of missing values in the social science data. To address these challenging problems, this thesis develops flexible nonparametric Bayesian methods for the analysis of social science data. Chapter 1 briefly explains backgrounds and motivations of the projects in the following chapters. Chapter 2 develops a nonparametric Bayesian modeling of temporal dependence in large sparse contingency tables, relying on a probabilistic factorization of the joint pmf. Chapter 3 proposes nonparametric Bayes inference on conditional independence with conditional mutual information used as a measure of the strength of conditional dependence. Chapter 4 proposes a novel Bayesian density estimation method in social surveys with complex designs where there is a gap between sample and population. We correct for the bias by adjusting mixture weights in Bayesian mixture models. Chapter 5 develops a nonparametric model for mixed-scale longitudinal surveys, in which various types of variables can be induced through latent continuous variables and dynamic latent factors lead to flexibly time-varying associations among variables.
Bayesian nonparametric estimation for Quantum Homodyne Tomography
Naulet, Zacharie; Barat, Eric
2016-01-01
We estimate the quantum state of a light beam from results of quantum homodyne tomography noisy measurements performed on identically prepared quantum systems. We propose two Bayesian nonparametric approaches. The first approach is based on mixture models and is illustrated through simulation examples. The second approach is based on random basis expansions. We study the theoretical performance of the second approach by quantifying the rate of contraction of the posterior distribution around ...
NONPARAMETRIC ESTIMATION OF CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS
Directory of Open Access Journals (Sweden)
Orlov A. I.
2015-10-01
Full Text Available The article is devoted to the nonparametric point and interval estimation of the characteristics of the probabilistic distribution (the expectation, median, variance, standard deviation, variation coefficient of the sample results. Sample values are regarded as the implementation of independent and identically distributed random variables with an arbitrary distribution function having the desired number of moments. Nonparametric analysis procedures are compared with the parametric procedures, based on the assumption that the sample values have a normal distribution. Point estimators are constructed in the obvious way - using sample analogs of the theoretical characteristics. Interval estimators are based on asymptotic normality of sample moments and functions from them. Nonparametric asymptotic confidence intervals are obtained through the use of special output technology of the asymptotic relations of Applied Statistics. In the first step this technology uses the multidimensional central limit theorem, applied to the sums of vectors whose coordinates are the degrees of initial random variables. The second step is the conversion limit multivariate normal vector to obtain the interest of researcher vector. At the same considerations we have used linearization and discarded infinitesimal quantities. The third step - a rigorous justification of the results on the asymptotic standard for mathematical and statistical reasoning level. It is usually necessary to use the necessary and sufficient conditions for the inheritance of convergence. This article contains 10 numerical examples. Initial data - information about an operating time of 50 cutting tools to the limit state. Using the methods developed on the assumption of normal distribution, it can lead to noticeably distorted conclusions in a situation where the normality hypothesis failed. Practical recommendations are: for the analysis of real data we should use nonparametric confidence limits
Multivariate nonparametric regression and visualization with R and applications to finance
Klemelä, Jussi
2014-01-01
A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
portfolio optimization based on nonparametric estimation methods
Directory of Open Access Journals (Sweden)
mahsa ghandehari
2017-03-01
Full Text Available One of the major issues investors are facing with in capital markets is decision making about select an appropriate stock exchange for investing and selecting an optimal portfolio. This process is done through the risk and expected return assessment. On the other hand in portfolio selection problem if the assets expected returns are normally distributed, variance and standard deviation are used as a risk measure. But, the expected returns on assets are not necessarily normal and sometimes have dramatic differences from normal distribution. This paper with the introduction of conditional value at risk ( CVaR, as a measure of risk in a nonparametric framework, for a given expected return, offers the optimal portfolio and this method is compared with the linear programming method. The data used in this study consists of monthly returns of 15 companies selected from the top 50 companies in Tehran Stock Exchange during the winter of 1392 which is considered from April of 1388 to June of 1393. The results of this study show the superiority of nonparametric method over the linear programming method and the nonparametric method is much faster than the linear programming method.
Agasisti, Tommaso
2011-01-01
The objective of this paper is an efficiency analysis concerning higher education systems in European countries. Data have been extracted from OECD data-sets (Education at a Glance, several years), using a non-parametric technique--data envelopment analysis--to calculate efficiency scores. This paper represents the first attempt to conduct such an…
Do Former College Athletes Earn More at Work? A Nonparametric Assessment
Henderson, Daniel J.; Olbrecht, Alexandre; Polachek, Solomon W.
2006-01-01
This paper investigates how students' collegiate athletic participation affects their subsequent labor market success. By using newly developed techniques in nonparametric regression, it shows that on average former college athletes earn a wage premium. However, the premium is not uniform, but skewed so that more than half the athletes actually…
Multi-Directional Non-Parametric Analysis of Agricultural Efficiency
DEFF Research Database (Denmark)
Balezentis, Tomas
This thesis seeks to develop methodologies for assessment of agricultural efficiency and employ them to Lithuanian family farms. In particular, we focus on three particular objectives throughout the research: (i) to perform a fully non-parametric analysis of efficiency effects, (ii) to extend...... relative to labour, intermediate consumption and land (in some cases land was not treated as a discretionary input). These findings call for further research on relationships among financial structure, investment decisions, and efficiency in Lithuanian family farms. Application of different techniques...... of stochasticity associated with Lithuanian family farm performance. The former technique showed that the farms differed in terms of the mean values and variance of the efficiency scores over time with some clear patterns prevailing throughout the whole research period. The fuzzy Free Disposal Hull showed...
Introduction to nonparametric statistics for the biological sciences using R
MacFarland, Thomas W
2016-01-01
This book contains a rich set of tools for nonparametric analyses, and the purpose of this supplemental text is to provide guidance to students and professional researchers on how R is used for nonparametric data analysis in the biological sciences: To introduce when nonparametric approaches to data analysis are appropriate To introduce the leading nonparametric tests commonly used in biostatistics and how R is used to generate appropriate statistics for each test To introduce common figures typically associated with nonparametric data analysis and how R is used to generate appropriate figures in support of each data set The book focuses on how R is used to distinguish between data that could be classified as nonparametric as opposed to data that could be classified as parametric, with both approaches to data classification covered extensively. Following an introductory lesson on nonparametric statistics for the biological sciences, the book is organized into eight self-contained lessons on various analyses a...
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
Energy Technology Data Exchange (ETDEWEB)
Koutsourelakis, P
2007-05-03
Clustering represents one of the most common statistical procedures and a standard tool for pattern discovery and dimension reduction. Most often the objects to be clustered are described by a set of measurements or observables e.g. the coordinates of the vectors, the attributes of people. In a lot of cases however the available observations appear in the form of links or connections (e.g. communication or transaction networks). This data contains valuable information that can in general be exploited in order to discover groups and better understand the structure of the dataset. Since in most real-world datasets, several of these links are missing, it is also useful to develop procedures that can predict those unobserved connections. In this report we address the problem of unsupervised group discovery in relational datasets. A fundamental issue in all clustering problems is that the actual number of clusters is unknown a priori. In most cases this is addressed by running the model several times assuming a different number of clusters each time and selecting the value that provides the best fit based on some criterion (ie Bayes factor in the case of Bayesian techniques). It is easily understood that it would be preferable to develop techniques that are able to number of clusters is essentially learned from that data along with the rest of model parameters. For that purpose, we adopt a nonparametric Bayesian framework which provides a very flexible modeling environment in which the size of the model i.e. the number of clusters, can adapt to the available data and readily accommodate outliers. The latter is particularly important since several groups of interest might consist of a small number of members and would most likely be smeared out by traditional modeling techniques. Finally, the proposed framework combines all the advantages of standard Bayesian techniques such as integration of prior knowledge in a principled manner, seamless accommodation of missing data
Directory of Open Access Journals (Sweden)
Khodadadi Mostafa
2014-01-01
Full Text Available Wheat is an important staple in human nutrition and improvement of its grain quality characters will have high impact on population's health. The objectives of this study were assessing variation of some grain quality characteristics in the Iranian wheat genotypes and identify the best type of data and clustering method for grouping genotypes. In this study 30 spring wheat genotypes were cultivated through randomized complete block design with three replications in 2009 and 2010 years. High significant difference among genotypes for all traits except for Sulfate, K, Br and Cl content, also deference among two years mean for all traits were no significant. Meanwhile there were significant interaction between year and genotype for all traits except Sulfate and F content. Mean values for crude protein, Zn, Fe and Ca in Mahdavi, Falat, Star, Sistan genotypes were the highest. The Ca and Br content showed the highest and the lowest broadcast heritability respectively. In this study indicated that the Root Mean Square Standard Deviation is efficient than R Squared and R Squared efficient than Semi Partial R Squared criteria for determining the best clustering technique. Also Ward method and canonical scores identified as the best clustering method and data type for grouping genotypes, respectively. Genotypes were grouped into six completely separate clusters and Roshan, Niknejad and Star genotypes from the fourth, fifth and sixth clusters had high grain quality characters in overall.
Lasue, Jeremie; Wiens, Roger; Stepinski, Tom; Forni, Olivier; Clegg, Samuel; Maurice, Sylvestre; Chemcam Team
2011-02-01
ChemCam is a remote laser-induced breakdown spectroscopy (LIBS) instrument that will arrive on Mars in 2012, on-board the Mars Science Laboratory Rover. The LIBS technique is crucial to accurately identify samples and quantify elemental abundances at various distances from the rover. In this study, we compare different linear and nonlinear multivariate techniques to visualize and discriminate clusters in two dimensions (2D) from the data obtained with ChemCam. We have used principal components analysis (PCA) and independent components analysis (ICA) for the linear tools and compared them with the nonlinear Sammon's map projection technique. We demonstrate that the Sammon's map gives the best 2D representation of the data set, with optimization values from 2.8% to 4.3% (0% is a perfect representation), together with an entropy value of 0.81 for the purity of the clustering analysis. The linear 2D projections result in three (ICA) and five times (PCA) more stress, and their clustering purity is more than twice higher with entropy values about 1.8. We show that the Sammon's map algorithm is faster and gives a slightly better representation of the data set if the initial conditions are taken from the ICA projection rather than the PCA projection. We conclude that the nonlinear Sammon's map projection is the best technique for combining data visualization and clustering assessment of the ChemCam LIBS data in 2D. PCA and ICA projections on more dimensions would improve on these numbers at the cost of the intuitive interpretation of the 2D projection by a human operator.
Energy Technology Data Exchange (ETDEWEB)
Halim, Zakiah Abd [Universiti Teknikal Malaysia Melaka (Malaysia); Jamaludin, Nordin; Junaidi, Syarif [Faculty of Engineering and Built, Universiti Kebangsaan Malaysia, Bangi (Malaysia); Yahya, Syed Yusainee Syed [Universiti Teknologi MARA, Shah Alam (Malaysia)
2015-04-15
Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.
A nonparametric and diversified portfolio model
Shirazi, Yasaman Izadparast; Sabiruzzaman, Md.; Hamzah, Nor Aishah
2014-07-01
Traditional portfolio models, like mean-variance (MV) suffer from estimation error and lack of diversity. Alternatives, like mean-entropy (ME) or mean-variance-entropy (MVE) portfolio models focus independently on the issue of either a proper risk measure or the diversity. In this paper, we propose an asset allocation model that compromise between risk of historical data and future uncertainty. In the new model, entropy is presented as a nonparametric risk measure as well as an index of diversity. Our empirical evaluation with a variety of performance measures shows that this model has better out-of-sample performances and lower portfolio turnover than its competitors.
Lottery spending: a non-parametric analysis.
Garibaldi, Skip; Frisoli, Kayla; Ke, Li; Lim, Melody
2015-01-01
We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Lottery spending: a non-parametric analysis.
Directory of Open Access Journals (Sweden)
Skip Garibaldi
Full Text Available We analyze the spending of individuals in the United States on lottery tickets in an average month, as reported in surveys. We view these surveys as sampling from an unknown distribution, and we use non-parametric methods to compare properties of this distribution for various demographic groups, as well as claims that some properties of this distribution are constant across surveys. We find that the observed higher spending by Hispanic lottery players can be attributed to differences in education levels, and we dispute previous claims that the top 10% of lottery players consistently account for 50% of lottery sales.
Nonparametric inferences for kurtosis and conditional kurtosis
Institute of Scientific and Technical Information of China (English)
XIE Xiao-heng; HE You-hua
2009-01-01
Under the assumption of strictly stationary process, this paper proposes a nonparametric model to test the kurtosis and conditional kurtosis for risk time series. We apply this method to the daily returns of S&P500 index and the Shanghai Composite Index, and simulate GARCH data for verifying the efficiency of the presented model. Our results indicate that the risk series distribution is heavily tailed, but the historical information can make its future distribution light-tailed. However the far future distribution's tails are little affected by the historical data.
Parametric versus non-parametric simulation
Dupeux, Bérénice; Buysse, Jeroen
2014-01-01
Most of ex-ante impact assessment policy models have been based on a parametric approach. We develop a novel non-parametric approach, called Inverse DEA. We use non parametric efficiency analysis for determining the farm’s technology and behaviour. Then, we compare the parametric approach and the Inverse DEA models to a known data generating process. We use a bio-economic model as a data generating process reflecting a real world situation where often non-linear relationships exist. Results s...
Chen, S; Mulgrew, B; Grant, P M
1993-01-01
The application of a radial basis function network to digital communications channel equalization is examined. It is shown that the radial basis function network has an identical structure to the optimal Bayesian symbol-decision equalizer solution and, therefore, can be employed to implement the Bayesian equalizer. The training of a radial basis function network to realize the Bayesian equalization solution can be achieved efficiently using a simple and robust supervised clustering algorithm. During data transmission a decision-directed version of the clustering algorithm enables the radial basis function network to track a slowly time-varying environment. Moreover, the clustering scheme provides an automatic compensation for nonlinear channel and equipment distortion. Computer simulations are included to illustrate the analytical results.
Using non-parametric methods in econometric production analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
Econometric estimation of production functions is one of the most common methods in applied economic production analysis. These studies usually apply parametric estimation techniques, which obligate the researcher to specify the functional form of the production function. Most often, the Cobb......-Douglas or the Translog production function is used. However, the specification of a functional form for the production function involves the risk of specifying a functional form that is not similar to the “true” relationship between the inputs and the output. This misspecification might result in biased estimation...... results—including measures that are of interest of applied economists, such as elasticities. Therefore, we propose to use nonparametric econometric methods. First, they can be applied to verify the functional form used in parametric estimations of production functions. Second, they can be directly used...
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.
Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z
2017-03-01
A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques
Institute of Scientific and Technical Information of China (English)
Peng Li; Bin Wang; Wei Jin
2012-01-01
As high quality descriptors of web page semantics,social annotations or tags have been used for web document clustering and achieved promising results.However,most web pages have few tags (less than 10).This sparsity seriously limits the usage of tags for clustering.In this work,we propose a user-related tag expansion method to overcome this problem,which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge.Unfortunately,simply adding tags may cause topic drift,i.e.,the dominant topic(s) of the original document may be changed.To tackle this problem,we have designed a novel generative model called Folk-LDA,which jointly models original and expanded tags as independent observations.ExPerimental results show that 1) our user-related tag expansion method can be effectively applied to over 90％ tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion,especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods,which indicates that tags could be a better resource for the clustering task.
Hearty, Aine P; Gibney, Michael J
2009-02-01
The aims of the present study were to examine and compare dietary patterns in adults using cluster and factor analyses and to examine the format of the dietary variables on the pattern solutions (i.e. expressed as grams/day (g/d) of each food group or as the percentage contribution to total energy intake). Food intake data were derived from the North/South Ireland Food Consumption Survey 1997-9, which was a randomised cross-sectional study of 7 d recorded food and nutrient intakes of a representative sample of 1379 Irish adults aged 18-64 years. Cluster analysis was performed using the k-means algorithm and principal component analysis (PCA) was used to extract dietary factors. Food data were reduced to thirty-three food groups. For cluster analysis, the most suitable format of the food-group variable was found to be the percentage contribution to energy intake, which produced six clusters: 'Traditional Irish'; 'Continental'; 'Unhealthy foods'; 'Light-meal foods & low-fat milk'; 'Healthy foods'; 'Wholemeal bread & desserts'. For PCA, food groups in the format of g/d were found to be the most suitable format, and this revealed four dietary patterns: 'Unhealthy foods & high alcohol'; 'Traditional Irish'; 'Healthy foods'; 'Sweet convenience foods & low alcohol'. In summary, cluster and PCA identified similar dietary patterns when presented with the same dataset. However, the two dietary pattern methods required a different format of the food-group variable, and the most appropriate format of the input variable should be considered in future studies.
Directory of Open Access Journals (Sweden)
Jorge Alfredo Ardila-Rey
2015-10-01
Full Text Available Both in industrial as in controlled environments, such as high-voltage laboratories, pulses from multiple sources, including partial discharges (PD and electrical noise can be superimposed. These circumstances can modify and alter the results of PD measurements and, what is more, they can lead to misinterpretation. The spectral power clustering technique (SPCT allows separating PD sources and electrical noise through the two-dimensional representation (power ratio map or PR map of the relative spectral power in two intervals, high and low frequency, calculated for each pulse captured with broadband sensors. This method allows to clearly distinguishing each of the effects of noise and PD, making it easy discrimination of all sources. In this paper, the separation ability of the SPCT clustering technique when using a Rogowski coil for PD measurements is evaluated. Different parameters were studied in order to establish which of them could help for improving the manual selection of the separation intervals, thus enabling a better separation of clusters. The signal processing can be performed during the measurements or in a further analysis.
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
Institute of Scientific and Technical Information of China (English)
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
Takamizawa, Hisashi; Itoh, Hiroto; Nishiyama, Yutaka
2016-10-01
In order to understand neutron irradiation embrittlement in high fluence regions, statistical analysis using the Bayesian nonparametric (BNP) method was performed for the Japanese surveillance and material test reactor irradiation database. The BNP method is essentially expressed as an infinite summation of normal distributions, with input data being subdivided into clusters with identical statistical parameters, such as mean and standard deviation, for each cluster to estimate shifts in ductile-to-brittle transition temperature (DBTT). The clusters typically depend on chemical compositions, irradiation conditions, and the irradiation embrittlement. Specific variables contributing to the irradiation embrittlement include the content of Cu, Ni, P, Si, and Mn in the pressure vessel steels, neutron flux, neutron fluence, and irradiation temperatures. It was found that the measured shifts of DBTT correlated well with the calculated ones. Data associated with the same materials were subdivided into the same clusters even if neutron fluences were increased.
Morrison, G E; Ledlow, M J; Keel, W C; Hill, J M; Voges, W; Herter, T L
2002-01-01
Radio observations were used to detect the `active' galaxy population within rich clusters of galaxies in a non-biased manner that is not plagued by dust extinction or the K-correction. We present wide-field radio, optical (imaging and spectroscopy), and ROSAT All-Sky Survey (RASS) X-ray data for a sample of 30 very rich Abell (R > 2) cluster with z 2E22 W/Hz) galaxy population within these extremely rich clusters for galaxies with M_R 5 M_sun/yr) and active galactic nuclei (AGN) populations contained within each cluster. Archival and newly acquired redshifts were used to verify cluster membership for most (~95%) of the optical identifications. Thus we can identify all the starbursting galaxies within these clusters, regardless of the level of dust obscuration that would affect these galaxies being identified from their optical signature. Cluster sample selection, observations, and data reduction techniques for all wavelengths are discussed.
Wang, Yanfei; Wu, Rong; Cho, Kathleen R; Shedden, Kerby A; Barder, Timothy J; Lubman, David M
2006-01-01
A two-dimensional liquid mapping method was used to map the protein expression of eight ovarian serous carcinoma cell lines and three immortalized ovarian surface epithelial cell lines. Maps were produced using pI as the separation parameter in the first dimension and hydrophobicity based upon reversed-phase HPLC separation in the second dimension. The method can be reproducibly used to produce protein expression maps over a pH range from 4.0 to 8.5. A dynamic programming method was used to correct for minor shifts in peaks during the HPLC gradient between sample runs. The resulting corrected maps can then be compared using hierarchical clustering to produce dendrograms indicating the relationship between different cell lines. It was found that several of the ovarian surface epithelial cell lines clustered together, whereas specific groups of serous carcinoma cell lines clustered with each other. Although there is limited information on the current biology of these cell lines, it was shown that the protein expression of certain cell lines is closely related to each other. Other cell lines, including one ovarian clear cell carcinoma cell line, two endometrioid carcinoma cell lines, and three breast epithelial cell lines, were also mapped for comparison to show that their protein profiles cluster differently than the serous samples and to study how they cluster relative to each other. In addition, comparisons can be made between proteins differentially expressed between cell lines that may serve as markers of ovarian serous carcinomas. The automation of the method allows reproducible comparison of many samples, and the use of differential analysis limits the number of proteins that might require further analysis by mass spectrometry techniques.
The Utility of Nonparametric Transformations for Imputation of Survey Data
Directory of Open Access Journals (Sweden)
Robbins Michael W.
2014-12-01
Full Text Available Missing values present a prevalent problem in the analysis of establishment survey data. Multivariate imputation algorithms (which are used to fill in missing observations tend to have the common limitation that imputations for continuous variables are sampled from Gaussian distributions. This limitation is addressed here through the use of robust marginal transformations. Specifically, kernel-density and empirical distribution-type transformations are discussed and are shown to have favorable properties when used for imputation of complex survey data. Although such techniques have wide applicability (i.e., they may be easily applied in conjunction with a wide array of imputation techniques, the proposed methodology is applied here with an algorithm for imputation in the USDA’s Agricultural Resource Management Survey. Data analysis and simulation results are used to illustrate the specific advantages of the robust methods when compared to the fully parametric techniques and to other relevant techniques such as predictive mean matching. To summarize, transformations based upon parametric densities are shown to distort several data characteristics in circumstances where the parametric model is ill fit; however, no circumstances are found in which the transformations based upon parametric models outperform the nonparametric transformations. As a result, the transformation based upon the empirical distribution (which is the most computationally efficient is recommended over the other transformation procedures in practice.
Deriving semantic structure from category fluency: clustering techniques and their pitfalls.
Voorspoels, Wouter; Storms, Gert; Longenecker, Julia; Verheyen, Steven; Weinberger, Daniel R; Elvevåg, Brita
2014-06-01
Assessing verbal output in category fluency tasks provides a sensitive indicator of cortical dysfunction. The most common metrics are the overall number of words produced and the number of errors. Two main observations have been made about the structure of the output, first that there is a temporal component to it with words being generated in spurts, and second that the clustering pattern may reflect a search for meanings such that the 'clustering' is attributable to the activation of a specific semantic field in memory. A number of sophisticated approaches to examining the structure of this clustering have been developed, and a core theme is that the similarity relations between category members will reveal the mental semantic structure of the category underlying an individual's responses, which can then be visualized by a number of algorithms, such as MDS, hierarchical clustering, ADDTREE, ADCLUS or SVD. Such approaches have been applied to a variety of neurological and psychiatric populations, and the general conclusion has been that the clinical condition systematically distorts the semantic structure in the patients, as compared to the healthy controls. In the present paper we explore this approach to understanding semantic structure using category fluency data. On the basis of a large pool of patients with schizophrenia (n = 204) and healthy control participants (n = 204), we find that the methods are problematic and unreliable to the extent that it is not possible to conclude that any putative difference reflects a systematic difference between the semantic representations in patients and controls. Moreover, taking into account the unreliability of the methods, we find that the most probable conclusion to be made is that no difference in underlying semantic representation exists. The consequences of these findings to understanding semantic structure, and the use of category fluency data, in cortical dysfunction are discussed.
Paraficz, D; Richard, J; Morandi, A; Limousin, M; Jullo, E
2012-01-01
We present a new detailed parametric strong lensing mass reconstruction of the Bullet Cluster (1E 0657-56) at z=0.296, based on new WFC3 and ACS HST imaging and VLT/FORS2 spectroscopy. The strong lensing constraints undergone deep revision, there are 14 (6 new and 8 previously known) multiply imaged systems, of which 3 have spectroscopically confirmed redshifts (including 2 newly measured). The reconstructed mass distribution includes explicitly for the first time the combination of 3 mass components: i) the intra-cluster gas mass derived from X-ray observation, ii) the cluster galaxies modeled by their Fundamental Plane (elliptical) and Tully-Fisher (spiral) scaling relations and iii) dark matter. The best model has an average rms value of 0.158" between the predicted and measured image positions for the 14 multiple images considered. The derived mass model confirms the spacial offset between the X-ray gas and dark matter peaks. The galaxy halos to total mass fraction is found to be f_s=11+/-5% for a total m...
Routing Technique Based on Clustering for Data Duplication Prevention in Wireless Sensor Network
Boseung Kim; HuiBin Lim; Yongtae Shin
2009-01-01
Wireless Sensor Networks is important to node’s energy consumption for long activity of sensor nodes because nodes that compose sensor network are small size, and battery capacity is limited. For energy consumption decrease of sensor nodes, sensor network’s routing technique is divided by flat routing and hierarchical routing technique. Specially, hierarchical routing technique is energy-efficient routing protocol to pare down energy consumption of whole sensor nodes and to scatter energy con...
Nonparametric dark energy reconstruction from supernova data.
Holsclaw, Tracy; Alam, Ujjaini; Sansó, Bruno; Lee, Herbert; Heitmann, Katrin; Habib, Salman; Higdon, David
2010-12-10
Understanding the origin of the accelerated expansion of the Universe poses one of the greatest challenges in physics today. Lacking a compelling fundamental theory to test, observational efforts are targeted at a better characterization of the underlying cause. If a new form of mass-energy, dark energy, is driving the acceleration, the redshift evolution of the equation of state parameter w(z) will hold essential clues as to its origin. To best exploit data from observations it is necessary to develop a robust and accurate reconstruction approach, with controlled errors, for w(z). We introduce a new, nonparametric method for solving the associated statistical inverse problem based on Gaussian process modeling and Markov chain Monte Carlo sampling. Applying this method to recent supernova measurements, we reconstruct the continuous history of w out to redshift z=1.5.
Local Component Analysis for Nonparametric Bayes Classifier
Khademi, Mahmoud; safayani, Meharn
2010-01-01
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with co...
Nonparametric k-nearest-neighbor entropy estimator.
Lombardi, Damiano; Pant, Sanjay
2016-01-01
A nonparametric k-nearest-neighbor-based entropy estimator is proposed. It improves on the classical Kozachenko-Leonenko estimator by considering nonuniform probability densities in the region of k-nearest neighbors around each sample point. It aims to improve the classical estimators in three situations: first, when the dimensionality of the random variable is large; second, when near-functional relationships leading to high correlation between components of the random variable are present; and third, when the marginal variances of random variable components vary significantly with respect to each other. Heuristics on the error of the proposed and classical estimators are presented. Finally, the proposed estimator is tested for a variety of distributions in successively increasing dimensions and in the presence of a near-functional relationship. Its performance is compared with a classical estimator, and a significant improvement is demonstrated.
Nonparametric estimation of location and scale parameters
Potgieter, C.J.
2012-12-01
Two random variables X and Y belong to the same location-scale family if there are constants μ and σ such that Y and μ+σX have the same distribution. In this paper we consider non-parametric estimation of the parameters μ and σ under minimal assumptions regarding the form of the distribution functions of X and Y. We discuss an approach to the estimation problem that is based on asymptotic likelihood considerations. Our results enable us to provide a methodology that can be implemented easily and which yields estimators that are often near optimal when compared to fully parametric methods. We evaluate the performance of the estimators in a series of Monte Carlo simulations. © 2012 Elsevier B.V. All rights reserved.
Nonparametric estimation of employee stock options
Institute of Scientific and Technical Information of China (English)
FU Qiang; LIU Li-an; LIU Qian
2006-01-01
We proposed a new model to price employee stock options (ESOs). The model is based on nonparametric statistical methods with market data. It incorporates the kernel estimator and employs a three-step method to modify BlackScholes formula. The model overcomes the limits of Black-Scholes formula in handling option prices with varied volatility. It disposes the effects of ESOs self-characteristics such as non-tradability, the longer term for expiration, the early exercise feature, the restriction on shorting selling and the employee's risk aversion on risk neutral pricing condition, and can be applied to ESOs valuation with the explanatory variable in no matter the certainty case or random case.
On Parametric (and Non-Parametric Variation
Directory of Open Access Journals (Sweden)
Neil Smith
2009-11-01
Full Text Available This article raises the issue of the correct characterization of ‘Parametric Variation’ in syntax and phonology. After specifying their theoretical commitments, the authors outline the relevant parts of the Principles–and–Parameters framework, and draw a three-way distinction among Universal Principles, Parameters, and Accidents. The core of the contribution then consists of an attempt to provide identity criteria for parametric, as opposed to non-parametric, variation. Parametric choices must be antecedently known, and it is suggested that they must also satisfy seven individually necessary and jointly sufficient criteria. These are that they be cognitively represented, systematic, dependent on the input, deterministic, discrete, mutually exclusive, and irreversible.
Directory of Open Access Journals (Sweden)
C. Parthasarathy
2013-07-01
Full Text Available Cloud computing is a highly discussed topic in the technical and economic world, and many of the big players of the software industry have entered the development of cloud services. Several companies’ and organizations wants to explore the possibilities and benefits of incorporating such cloud computing services in their business, as well as the possibilities to offer own cloud services. We are going to mine the un-compressed image from the cloud and use k-means clustering grouping the uncompressed image and compress it with Lempel-ziv-welch coding technique so that the un-compressed images becomes error-free compression and spatial redundancies.
Nonparametric inference of network structure and dynamics
Peixoto, Tiago P.
The network structure of complex systems determine their function and serve as evidence for the evolutionary mechanisms that lie behind them. Despite considerable effort in recent years, it remains an open challenge to formulate general descriptions of the large-scale structure of network systems, and how to reliably extract such information from data. Although many approaches have been proposed, few methods attempt to gauge the statistical significance of the uncovered structures, and hence the majority cannot reliably separate actual structure from stochastic fluctuations. Due to the sheer size and high-dimensionality of many networks, this represents a major limitation that prevents meaningful interpretations of the results obtained with such nonstatistical methods. In this talk, I will show how these issues can be tackled in a principled and efficient fashion by formulating appropriate generative models of network structure that can have their parameters inferred from data. By employing a Bayesian description of such models, the inference can be performed in a nonparametric fashion, that does not require any a priori knowledge or ad hoc assumptions about the data. I will show how this approach can be used to perform model comparison, and how hierarchical models yield the most appropriate trade-off between model complexity and quality of fit based on the statistical evidence present in the data. I will also show how this general approach can be elegantly extended to networks with edge attributes, that are embedded in latent spaces, and that change in time. The latter is obtained via a fully dynamic generative network model, based on arbitrary-order Markov chains, that can also be inferred in a nonparametric fashion. Throughout the talk I will illustrate the application of the methods with many empirical networks such as the internet at the autonomous systems level, the global airport network, the network of actors and films, social networks, citations among
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Nonparametric Bayesian inference for multidimensional compound Poisson processes
S. Gugushvili; F. van der Meulen; P. Spreij
2015-01-01
Given a sample from a discretely observed multidimensional compound Poisson process, we study the problem of nonparametric estimation of its jump size density r0 and intensity λ0. We take a nonparametric Bayesian approach to the problem and determine posterior contraction rates in this context, whic
Link, J M; Alimonti, G; Anjos, J C; Arena, V; Barberis, S; Bediaga, I; Benussi, L; Bianco, S; Boca, G; Bonomi, G; Boschini, M; Butler, J N; Carrillo, S; Casimiro, E; Castromonte, C; Cawlfield, C; Cerutti, A; Cheung, H W K; Chiodini, G; Cho, K; Chung, Y S; Cinquini, L; Cuautle, E; Cumalat, J P; D'Angelo, P; Davenport, T F; De Miranda, J M; Di Corato, M; Dini, P; Dos Reis, A C; Edera, L; Engh, D; Erba, S; Fabbri, F L; Frisullo, V; Gaines, I; Garbincius, P H; Gardner, R; Garren, L A; Gianini, G; Gottschalk, E; Göbel, C; Handler, T; Hernández, H; Hosack, M; Inzani, P; Johns, W E; Kang, J S; Kasper, P H; Kim, D Y; Ko, B R; Kreymer, A E; Kryemadhi, A; Kutschke, R; Kwak, J W; Lee, K B; Leveraro, F; Liguori, G; Lopes-Pegna, D; Luiggi, E; López, A M; Machado, A A; Magnin, J; Malvezzi, S; Massafferri, A; Menasce, D; Merlo, M M; Mezzadri, M; Mitchell, R; Moroni, L; Méndez, H; Nehring, M; O'Reilly, B; Otalora, J; Pantea, D; Paris, A; Park, H; Pedrini, D; Pepe, I M; Polycarpo, E; Pontoglio, C; Prelz, F; Quinones, J; Rahimi, A; Ramírez, J E; Ratti, S P; Reyes, M; Riccardi, C; Rovere, M; Sala, S; Segoni, I; Sheaff, M; Sheldon, P D; Stenson, K; Sánchez-Hernández, A; Uribe, C; Vaandering, E W; Vitulo, P; Vázquez, F; Wang, M; Webster, M; Wilson, J R; Wiss, J; Yager, P M; Zallo, A; Zhang, Y
2007-01-01
Using a large sample of \\dpkkpi{} decays collected by the FOCUS photoproduction experiment at Fermilab, we present the first non-parametric analysis of the \\kpi{} amplitudes in \\dpkkpi{} decay. The technique is similar to the technique used for our non-parametric measurements of the \\krzmndk{} form factors. Although these results are in rough agreement with those of E687, we observe a wider S-wave contribution for the \\ksw{} contribution than the standard, PDG \\cite{pdg} Breit-Wigner parameterization. We have some weaker evidence for the existence of a new, D-wave component at low values of the $K^- \\pi^+$ mass.
Walker, S. A.; Sanders, J. S.; Fabian, A. C.
2016-09-01
The unrivalled spatial resolution of the Chandra X-ray observatory has allowed many breakthroughs to be made in high-energy astrophysics. Here we explore applications of Gaussian gradient magnitude (GGM) filtering to X-ray data, which dramatically improves the clarity of surface brightness edges in X-ray observations, and maps gradients in X-ray surface brightness over a range of spatial scales. In galaxy clusters, we find that this method is able to reveal remarkable substructure behind the cold fronts in Abell 2142 and Abell 496, possibly the result of Kelvin-Helmholtz instabilities. In Abell 2319 and Abell 3667, we demonstrate that the GGM filter can provide a straightforward way of mapping variations in the widths and jump ratios along the lengths of cold fronts. We present results from our ongoing programme of analysing the Chandra and XMM-Newton archives with the GGM filter. In the Perseus cluster, we identify a previously unseen edge around 850 kpc from the core to the east, lying outside a known large-scale cold front, which is possibly a bow shock. In MKW 3s we find an unusual `V' shape surface brightness enhancement starting at the cluster core, which may be linked to the AGN jet. In the Crab nebula a new, moving feature in the outer part of the torus is identified which moves across the plane of the sky at a speed of ˜0.1c, and lies much further from the central pulsar than the previous motions seen by Chandra.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
Directory of Open Access Journals (Sweden)
Alejandro Jara
2011-04-01
Full Text Available Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
POLYMER COMPOSITE FILMS WITH SIZE-SELECTED METAL NANOPARTICLES FABRICATED BY CLUSTER BEAM TECHNIQUE
DEFF Research Database (Denmark)
Ceynowa, F. A.; Chirumamilla, Manohar; Popok, Vladimir
2017-01-01
after the deposition. The degree of immersion can be controlled by the annealing temperature and time. Together with control of cluster coverage the described approach represents an efficient method for the synthesis of thin polymer composite layers with either partially or fully embedded metal NPs......, in particular, for the use of phenomenon of localized surface plasmon resonance (LSPR). Unfortunately, it is found that the thermal annealing used in the production process can lead to quenching of plasmonic properties in the case of copper. To solve this problem, it is suggested to treat the samples with ozone...
Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor
2017-05-01
Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Non-parametric and least squares Langley plot methods
Directory of Open Access Journals (Sweden)
P. W. Kiedron
2015-04-01
Full Text Available Langley plots are used to calibrate sun radiometers primarily for the measurement of the aerosol component of the atmosphere that attenuates (scatters and absorbs incoming direct solar radiation. In principle, the calibration of a sun radiometer is a straightforward application of the Bouguer–Lambert–Beer law V=V>/i>0e−τ ·m, where a plot of ln (V voltage vs. m air mass yields a straight line with intercept ln (V0. This ln (V0 subsequently can be used to solve for τ for any measurement of V and calculation of m. This calibration works well on some high mountain sites, but the application of the Langley plot calibration technique is more complicated at other, more interesting, locales. This paper is concerned with ferreting out calibrations at difficult sites and examining and comparing a number of conventional and non-conventional methods for obtaining successful Langley plots. The eleven techniques discussed indicate that both least squares and various non-parametric techniques produce satisfactory calibrations with no significant differences among them when the time series of ln (V0's are smoothed and interpolated with median and mean moving window filters.
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
施沛德; 王海燕; 张利华
2000-01-01
For regression analysis, some useful Information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literat黵e, but the optimal rates of global convergence have not been obtained yet. Because of the possible Information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression f unction based on right-censored response data, and proves, under some regularity condi-tions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtai
2nd Conference of the International Society for Nonparametric Statistics
Manteiga, Wenceslao; Romo, Juan
2016-01-01
This volume collects selected, peer-reviewed contributions from the 2nd Conference of the International Society for Nonparametric Statistics (ISNPS), held in Cádiz (Spain) between June 11–16 2014, and sponsored by the American Statistical Association, the Institute of Mathematical Statistics, the Bernoulli Society for Mathematical Statistics and Probability, the Journal of Nonparametric Statistics and Universidad Carlos III de Madrid. The 15 articles are a representative sample of the 336 contributed papers presented at the conference. They cover topics such as high-dimensional data modelling, inference for stochastic processes and for dependent data, nonparametric and goodness-of-fit testing, nonparametric curve estimation, object-oriented data analysis, and semiparametric inference. The aim of the ISNPS 2014 conference was to bring together recent advances and trends in several areas of nonparametric statistics in order to facilitate the exchange of research ideas, promote collaboration among researchers...
Walker, S A; Fabian, A C
2016-01-01
The unrivalled spatial resolution of the Chandra X-ray observatory has allowed many breakthroughs to be made in high energy astrophysics. Here we explore applications of Gaussian Gradient Magnitude (GGM) filtering to X-ray data, which dramatically improves the clarity of surface brightness edges in X-ray observations, and maps gradients in X-ray surface brightness over a range of spatial scales. In galaxy clusters, we find that this method is able to reveal remarkable substructure behind the cold fronts in Abell 2142 and Abell 496, possibly the result of Kelvin Helmholtz instabilities. In Abell 2319 and Abell 3667, we demonstrate that the GGM filter can provide a straightforward way of mapping variations in the widths and jump ratios along the lengths of cold fronts. We present results from our ongoing programme of analysing the Chandra and XMM-Newton archives with the GGM filter. In the Perseus cluster we identify a previously unseen edge around 850 kpc from the core to the east, lying outside a known large ...
Solute-Vacancy Clustering In Al-Mg-Si Alloys Studied By Muon Spin Relaxation Technique
Directory of Open Access Journals (Sweden)
Nishimura K.
2015-06-01
Full Text Available Zero-field muon spin relaxation experiments were carried out with Al-1.6%Mg2Si, Al-0.5%Mg, and Al-0.5%Si alloys. Observed relaxation spectra were compared with the calculated relaxation functions based on the Monte Carlo simulation to extract the dipolar width (Δ, trapping (νt, and detrapping rates (νd, with the initially trapped muon fraction (P0. The fitting analysis has elucidated that the muon trapping rates depended on the heat treatment and solute concentrations. The dissolved Mg in Al dominated the νt at lower temperatures below 120 K, therefore the similar temperature variations of νt were observed with the samples mixed with Mg. The νt around 200 K remarkably reflected the heat treatment effect on the samples, and the largest νt value was found with the sample annealed at 100°C among Al-1.6%Mg2Si alloys. The as-quenched Al-0.5%Si sample showed significant νt values between 80 and 280 K relating with Si-vacancy clusters, but such clusters disappeared with the natural aged Al-0.5%Si sample.
Saad, Walid; Poor, H Vincent; Başar, Tamer; Song, Ju Bin
2012-01-01
This paper introduces a novel approach that enables a number of cognitive radio devices that are observing the availability pattern of a number of primary users(PUs), to cooperate and use \\emph{Bayesian nonparametric} techniques to estimate the distributions of the PUs' activity pattern, assumed to be completely unknown. In the proposed model, each cognitive node may have its own individual view on each PU's distribution, and, hence, seeks to find partners having a correlated perception. To address this problem, a coalitional game is formulated between the cognitive devices and an algorithm for cooperative coalition formation is proposed. It is shown that the proposed coalition formation algorithm allows the cognitive nodes that are experiencing a similar behavior from some PUs to self-organize into disjoint, independent coalitions. Inside each coalition, the cooperative cognitive nodes use a combination of Bayesian nonparametric models such as the Dirichlet process and statistical goodness of fit techniques ...
Lawson, Andrew B
2002-01-01
Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome research. In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal ...
Ultra-Wideband Geo-Regioning: A Novel Clustering and Localization Technique
Directory of Open Access Journals (Sweden)
Armin Wittneben
2007-12-01
Full Text Available Ultra-wideband (UWB technology enables a high temporal resolution of the propagation channel. Consequently, a channel impulse response between transmitter and receiver can be interpreted as signature for their relative positions. If the position of the receiver is known, the channel impulse response indicates the position of the transmitter and vice versa. This work introduces UWB geo-regioning as a clustering and localization method based on channel impulse response fingerprinting, develops a theoretical framework for performance analysis, and evaluates this approach by means of performance results based on measured channel impulse responses. Complexity issues are discussed and performance dependencies on signal-to-noise ratio, a priori knowledge, observation window, and system bandwidth are investigated.
Using Modified Partitioning Around Medoids Clustering Technique in Mobile Network Planning
Directory of Open Access Journals (Sweden)
Lamiaa Fattouh Ibrahim
2012-11-01
Full Text Available Optimization mobile radio network planning is a very complex task, as many aspects must be taken into account. Deciding upon the optimum placement for the base stations (BSs to achieve best services while reducing the cost is a complex task requiring vast computational resource. This paper introduces the spatial clustering to solve the Mobile Networking Planning problem. It addresses antenna placement problem or the cell planning problem, involves locating and configuring infrastructure for mobile networks by modified the original Partitioning Around Medoids PAM algorithm. M-PAM (Modified Partitioning Around Medoids has been proposed to satisfy the requirements and constraints. Implementation of this algorithm to a real case study is presented. Experimental results and analysis indicate that the M-PAM algorithm is effective in case of heavy load distribution, and leads to minimum number of base stations, which directly affected onto the cost of planning the network.
Fault-Tolerant Technique in the Cluster Computation of the Digital Watershed Model
Institute of Scientific and Technical Information of China (English)
SHANG Yizi; WU Baosheng; LI Tiejian; FANG Shenguang
2007-01-01
This paper describes a parallel computing platform using the existing facilities for the digital watershed model. In this paper, distributed multi-layered structure is applied to the computer cluster system, and the MPI-2 is adopted as a mature parallel programming standard. An agent is introduced which makes it possible to be multi-level fault-tolerant in software development. The communication protocol based on checkpointing and rollback recovery mechanism can realize the transaction reprocessing. Compared with conventional platform, the new system is able to make better use of the computing resource. Experimental results show the speedup ratio of the platform is almost 4 times as that of the conventional one, which demonstrates the high efficiency and good performance of the new approach.
Asymmetry Effects in Chinese Stock Markets Volatility: A Generalized Additive Nonparametric Approach
Hou, Ai Jun
2007-01-01
The unique characteristics of the Chinese stock markets make it difficult to assume a particular distribution for innovations in returns and the specification form of the volatility process when modeling return volatility with the parametric GARCH family models. This paper therefore applies a generalized additive nonparametric smoothing technique to examine the volatility of the Chinese stock markets. The empirical results indicate that an asymmetric effect of negative news exists in the Chin...
The properties and mechanism of long-term memory in nonparametric volatility
Li, Handong; Cao, Shi-Nan; Wang, Yan
2010-08-01
Recent empirical literature documents the presence of long-term memory in return volatility. But the mechanism of the existence of long-term memory is still unclear. In this paper, we investigate the origin and properties of long-term memory with nonparametric volatility, using high-frequency time series data of the Chinese Shanghai Composite Stock Price Index. We perform Detrended Fluctuation Analysis (DFA) on three different nonparametric volatility estimators with different sampling frequencies. For the same volatility series, the Hurst exponents reduce as the sampling time interval increases, but they are still larger than 1/2, which means that no matter how the interval changes, it still cannot change the existence of long memory. RRV presents a relatively stable property on long-term memory and is less influenced by sampling frequency. RV and RBV have some evolutionary trends depending on time intervals, which indicating that the jump component has no significant impact on the long-term memory property. This suggests that the presence of long-term memory in nonparametric volatility can be contributed to the integrated variance component. Considering the impact of microstructure noise, RBV and RRV still present long-term memory under various time intervals. We can infer that the presence of long-term memory in realized volatility is not affected by market microstructure noise. Our findings imply that the long-term memory phenomenon is an inherent characteristic of the data generating process, not a result of microstructure noise or volatility clustering.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.
Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben
2017-06-06
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data
DEFF Research Database (Denmark)
Tan, Qihua; Thomassen, Mads; Burton, Mark
2017-01-01
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...... time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health....
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data
DEFF Research Database (Denmark)
Tan, Qihua; Thomassen, Mads; Burton, Mark
2017-01-01
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering...... the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray...
Nonparametric methods in actigraphy: An update
Directory of Open Access Journals (Sweden)
Bruno S.B. Gonçalves
2014-09-01
Full Text Available Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm results for each time interval. Simulated data showed that (1 synchronization analysis depends on sample size, and (2 fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization.
Bayesian nonparametric adaptive control using Gaussian processes.
Chowdhary, Girish; Kingravi, Hassan A; How, Jonathan P; Vela, Patricio A
2015-03-01
Most current model reference adaptive control (MRAC) methods rely on parametric adaptive elements, in which the number of parameters of the adaptive element are fixed a priori, often through expert judgment. An example of such an adaptive element is radial basis function networks (RBFNs), with RBF centers preallocated based on the expected operating domain. If the system operates outside of the expected operating domain, this adaptive element can become noneffective in capturing and canceling the uncertainty, thus rendering the adaptive controller only semiglobal in nature. This paper investigates a Gaussian process-based Bayesian MRAC architecture (GP-MRAC), which leverages the power and flexibility of GP Bayesian nonparametric models of uncertainty. The GP-MRAC does not require the centers to be preallocated, can inherently handle measurement noise, and enables MRAC to handle a broader set of uncertainties, including those that are defined as distributions over functions. We use stochastic stability arguments to show that GP-MRAC guarantees good closed-loop performance with no prior domain knowledge of the uncertainty. Online implementable GP inference methods are compared in numerical simulations against RBFN-MRAC with preallocated centers and are shown to provide better tracking and improved long-term learning.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Nonparametric Detection of Geometric Structures Over Networks
Zou, Shaofeng; Liang, Yingbin; Poor, H. Vincent
2017-10-01
Nonparametric detection of existence of an anomalous structure over a network is investigated. Nodes corresponding to the anomalous structure (if one exists) receive samples generated by a distribution q, which is different from a distribution p generating samples for other nodes. If an anomalous structure does not exist, all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary and unknown. The goal is to design statistically consistent tests with probability of errors converging to zero as the network size becomes asymptotically large. Kernel-based tests are proposed based on maximum mean discrepancy that measures the distance between mean embeddings of distributions into a reproducing kernel Hilbert space. Detection of an anomalous interval over a line network is first studied. Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent. It is also shown that certain necessary conditions must hold to guarantee any test to be universally consistent. Comparison of sufficient and necessary conditions yields that the proposed test is order-level optimal and nearly optimal respectively in terms of minimum and maximum sizes of candidate anomalous intervals. Generalization of the results to other networks is further developed. Numerical results are provided to demonstrate the performance of the proposed tests.
Institute of Scientific and Technical Information of China (English)
张正旺; 李爱平; 刘雪梅; 谢楠
2013-01-01
视主轴-刀柄结合面的双面接触部为一个可以用系统状态变量描述的非线性单元,基于非参数化方法结合有限元分析技术对主轴-刀柄结合面的非线性动态特征参数进行了识别.在HyperMesh中建立主轴-刀柄系统的有限元分析模型,利用Radioss求解器对该模型进行非线性瞬态响应分析,得到结合面处位移、速度、加速度的瞬态响应,再根据牛顿第二运动定律推导得出结合面处非线性接触力的时间历程数据,基于最小二乘法采用切比雪夫多项式对样本数据进行回归分析,得到主轴-刀柄结合面之间非线性接触力的解析表达模型,并验证了该模型的正确性.%Based on nonparametric identification technique and finite element method,a nonlinear dynamic parameter identification method for spindle-holder interface is presented,in which the double contact parts between spindle and holder is considered as a nonlinear element descripting by system state variables.A finite element model of spindle holder system is constructed by HyperMesh,and its nonlinear transient response analysis is conducted using Radioss solver.With Newton's second law of motion,the time history data of nonlinear contact force of the interface are then deduced.Least square estimates with the Chebyshev polynomial basis are employed to approximate the sample data,and an analytic model of the nonlinear contact force of the interface is obtained.
Ettl, Florian; Testori, Christoph; Weiser, Christoph; Fleischhackl, Sabine; Mayer-Stickler, Monika; Herkner, Harald; Schreiber, Wolfgang; Fleischhackl, Roman
2011-06-01
The first-aid training necessary for obtaining a drivers license in Austria has a regulated and predefined curriculum but has been targeted for the implementation of a new course structure with less theoretical input, repetitive training in cardiopulmonary resuscitation (CPR) and structured presentations using innovative media. The standard and a new course design were compared with a prospective, participant- and observer-blinded, cluster-randomized controlled study. Six months after the initial training, we evaluated the confidence of the 66 participants in their skills, CPR effectiveness parameters and correctness of their actions. The median self-confidence was significantly higher in the interventional group [IG, visual analogue scale (VAS:"0" not-confident at all,"100" highly confident):57] than in the control group (CG, VAS:41). The mean chest compression rate in the IG (98/min) was closer to the recommended 100 bpm than in the CG (110/min). The time to the first chest compression (IG:25s, CG:36s) and time to first defibrillator shock (IG:86s, CG:92s) were significantly shorter in the IG. Furthermore, the IG participants were safer in their handling of the defibrillator and started with countermeasures against developing shock more often. The management of an unconscious person and of heavy bleeding did not show a difference between the two groups even after shortening the lecture time. Motivation and self-confidence as well as skill retention after six months were shown to be dependent on the teaching methods and the time for practical training. Courses may be reorganized and content rescheduled, even within predefined curricula, to improve course outcomes. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Investigation of Cu(In,Ga)Se{sub 2} using Monte Carlo and the cluster expansion technique
Energy Technology Data Exchange (ETDEWEB)
Ludwig, Christian D.R.; Gruhn, Thomas; Felser, Claudia [Institute of Inorganic and Analytical Chemistry, Johannes Gutenberg-University, Mainz (Germany); Windeln, Johannes [IBM Germany, Mgr. Technology Center ISC EMEA, Mainz (Germany)
2010-07-01
CIGS based solar cells are among the most promising thin-film techniques for cheap, yet efficient modules. They have been investigated for many years, but the full potential of CIGS cells has not yet been exhausted and many effects are not understood. For instance, the band gap of the absorber material Cu(In,Ga)Se{sub 2} varies with Ga content. The question why solar cells with high Ga content have low efficiencies, despite the fact that the band gap should have the optimum value, is still unanswered. We are using Monte Carlo simulations in combination with a cluster expansion to investigate the homogeneity of the In-Ga distribution as a possible cause of the low efficiency of cells with high Ga content. The cluster expansion is created by a fit to ab initio electronic structure energies. The results we found are crucial for the processing of solar cells, shed light on structural properties and give hints on how to significantly improve solar cell performance. Above the transition temperature from the separated to the mixed phase, we observe different sizes of the In and Ga domains for a given temperature. The In domains in the Ga-rich compound are smaller and less abundant than the Ga domains in the In-rich compound. This translates into the Ga-rich material being less homogeneous.
Energy Technology Data Exchange (ETDEWEB)
Bevelhimer, Mark S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Adams, Marshall [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Fortner, Allison M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Greeley, Jr, Mark Stephen [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Brandt, Craig C. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
2014-01-01
The effect of coal ash exposure on fish health in freshwater communities is largely unknown. Given the large number of possible pathways of effects (e.g., toxicological effect of exposure to multiple metals, physical effects from ash exposure, and food web effects), measurement of only a few health metrics is not likely to give a complete picture. The authors measured a suite of 20 health metrics from 1100+ fish collected from 5 sites (3 affected and 2 reference) near a coal ash spill in east Tennessee over a 4.5-yr period. The metrics represented a wide range of physiological and energetic responses and were evaluated simultaneously using 2 multivariate techniques. Results from both hierarchical clustering and canonical discriminant analyses suggested that for most speciesXseason combinations, the suite of fish health indicators varied more among years than between spill and reference sites within a year. In a few cases, spill sites from early years in the investigation stood alone or clustered together separate from reference sites and later year spill sites. Outlier groups of fish with relatively unique health profiles were most often from spill sites, suggesting that some response to the ash exposure may have occurred. Results from the 2 multivariate methods suggest that any change in the health status of fish at the spill sites was small and appears to have diminished since the first 2 to 3 yr after the spill.
Le Bihan, Nicolas; Margerin, Ludovic
2009-07-01
In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.
Bihan, Nicolas Le
2009-01-01
In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using Compound Poisson Processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.
Nonparametric Kernel Smoothing Methods. The sm library in Xlisp-Stat
Directory of Open Access Journals (Sweden)
Luca Scrucca
2001-06-01
Full Text Available In this paper we describe the Xlisp-Stat version of the sm library, a software for applying nonparametric kernel smoothing methods. The original version of the sm library was written by Bowman and Azzalini in S-Plus, and it is documented in their book Applied Smoothing Techniques for Data Analysis (1997. This is also the main reference for a complete description of the statistical methods implemented. The sm library provides kernel smoothing methods for obtaining nonparametric estimates of density functions and regression curves for different data structures. Smoothing techniques may be employed as a descriptive graphical tool for exploratory data analysis. Furthermore, they can also serve for inferential purposes as, for instance, when a nonparametric estimate is used for checking a proposed parametric model. The Xlisp-Stat version includes some extensions to the original sm library, mainly in the area of local likelihood estimation for generalized linear models. The Xlisp-Stat version of the sm library has been written following an object-oriented approach. This should allow experienced Xlisp-Stat users to implement easily their own methods and new research ideas into the built-in prototypes.
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
Institute of Scientific and Technical Information of China (English)
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
A Bayesian nonparametric approach to reconstruction and prediction of random dynamical systems
Merkatas, Christos; Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2017-06-01
We propose a Bayesian nonparametric mixture model for the reconstruction and prediction from observed time series data, of discretized stochastic dynamical systems, based on Markov Chain Monte Carlo methods. Our results can be used by researchers in physical modeling interested in a fast and accurate estimation of low dimensional stochastic models when the size of the observed time series is small and the noise process (perhaps) is non-Gaussian. The inference procedure is demonstrated specifically in the case of polynomial maps of an arbitrary degree and when a Geometric Stick Breaking mixture process prior over the space of densities, is applied to the additive errors. Our method is parsimonious compared to Bayesian nonparametric techniques based on Dirichlet process mixtures, flexible and general. Simulations based on synthetic time series are presented.
Directory of Open Access Journals (Sweden)
Md Anisur Rahman
2015-04-01
Full Text Available In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ModEx and Simple K-Means. Seed-Detective uses ModEx to produce a set of high quality initial seeds that are then given as input to K-Means for producing the final clusters. The high quality initial seeds are expected to produce high quality clusters through K-Means. The performances of Seed-Detective and ModEx are compared with the performances of Ex-Detective, PAM, Simple K-Means (SK, Basic Farthest Point Heuristic (BFPH and New Farthest Point Heuristic (NFPH. We use three cluster evaluation criteria namely F-measure, Entropy and Purity and four natural datasets that we obtain from the UCI Machine learning repository. In the datasets our proposed techniques perform better than the existing techniques in terms of F-measure, Entropy and Purity. The sign test results suggest a statistical significance of the superiority of Seed-Detective (and ModEx over the existing techniques.
Directory of Open Access Journals (Sweden)
Ashraf Afifi
2012-07-01
Full Text Available In this paper we present a hybrid approach based on combining fuzzy k-means clustering, seed region growing, and sensitivity and specificity algorithms to measure gray (GM and white matter (WM tissue. The proposed algorithm uses intensity and anatomic information for segmenting of MRIs into different tissue classes, especially GM and WM. It starts by partitioning the image into different clusters using fuzzy k-means clustering. The centers of these clusters are the input to the region growing (SRG method for creating the closed regions. The outputs of SRG technique are fed to sensitivity and specificity algorithm to merge the similar regions in one segment. The proposed algorithm is applied to challenging applications: gray matter/white matter segmentation in magnetic resonance image (MRI datasets. The experimental results show that the proposed technique produces accurate and stable results.
Nonparametric Bayesian drift estimation for multidimensional stochastic differential equations
Gugushvili, S.; Spreij, P.
2014-01-01
We consider nonparametric Bayesian estimation of the drift coefficient of a multidimensional stochastic differential equation from discrete-time observations on the solution of this equation. Under suitable regularity conditions, we establish posterior consistency in this context.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of
A non-parametric approach to investigating fish population dynamics
National Research Council Canada - National Science Library
Cook, R.M; Fryer, R.J
2001-01-01
.... Using a non-parametric model for the stock-recruitment relationship it is possible to avoid defining specific functions relating recruitment to stock size while also providing a natural framework to model process error...
Non-parametric approach to the study of phenotypic stability.
Ferreira, D F; Fernandes, S B; Bruzi, A T; Ramalho, M A P
2016-02-19
The aim of this study was to undertake the theoretical derivations of non-parametric methods, which use linear regressions based on rank order, for stability analyses. These methods were extension different parametric methods used for stability analyses and the result was compared with a standard non-parametric method. Intensive computational methods (e.g., bootstrap and permutation) were applied, and data from the plant-breeding program of the Biology Department of UFLA (Minas Gerais, Brazil) were used to illustrate and compare the tests. The non-parametric stability methods were effective for the evaluation of phenotypic stability. In the presence of variance heterogeneity, the non-parametric methods exhibited greater power of discrimination when determining the phenotypic stability of genotypes.
An Extension of the Fuzzy Possibilistic Clustering Algorithm Using Type-2 Fuzzy Logic Techniques
Directory of Open Access Journals (Sweden)
Elid Rubio
2017-01-01
Full Text Available In this work an extension of the Fuzzy Possibilistic C-Means (FPCM algorithm using Type-2 Fuzzy Logic Techniques is presented, and this is done in order to improve the efficiency of FPCM algorithm. With the purpose of observing the performance of the proposal against the Interval Type-2 Fuzzy C-Means algorithm, several experiments were made using both algorithms with well-known datasets, such as Wine, WDBC, Iris Flower, Ionosphere, Abalone, and Cover type. In addition some experiments were performed using another set of test images to observe the behavior of both of the above-mentioned algorithms in image preprocessing. Some comparisons are performed between the proposed algorithm and the Interval Type-2 Fuzzy C-Means (IT2FCM algorithm to observe if the proposed approach has better performance than this algorithm.
Systematic clustering of transcription start site landscapes
DEFF Research Database (Denmark)
Zhao, Xiaobei; Valen, Eivind; Parker, Brian J
2011-01-01
developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad...
Nonparametric Bayesian Modeling for Automated Database Schema Matching
Energy Technology Data Exchange (ETDEWEB)
Ferragut, Erik M [ORNL; Laska, Jason A [ORNL
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
PV power forecast using a nonparametric PV model
Almeida, Marcelo Pinho; Perpiñan Lamigueiro, Oscar; Narvarte Fernández, Luis
2015-01-01
Forecasting the AC power output of a PV plant accurately is important both for plant owners and electric system operators. Two main categories of PV modeling are available: the parametric and the nonparametric. In this paper, a methodology using a nonparametric PV model is proposed, using as inputs several forecasts of meteorological variables from a Numerical Weather Forecast model, and actual AC power measurements of PV plants. The methodology was built upon the R environment and uses Quant...
Biological parametric mapping with robust and non-parametric statistics.
Yang, Xue; Beason-Held, Lori; Resnick, Susan M; Landman, Bennett A
2011-07-15
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, regions of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrices. Recently, biological parametric mapping has extended the widely popular statistical parametric mapping approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and non-parametric regression in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provide a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities. Copyright © 2011 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
K. Selvi
2014-12-01
Full Text Available Pattern recognition, envisaging supervised and unsupervised method, optimization, associative memory and control process are some of the diversified troubles that can be resolved by artificial neural networks. Problem identified: Of late, discovering the required information in massive quantity of data is the challenging tasks. The model of similarity evaluation is the central element in accomplishing a perceptive of variables and perception that encourage behavior and mediate concern. This study proposes Artificial Neural Networks algorithms to resolve similarity measures. In order to apply singular value decomposition the frequency of word pair is established in the given document. (1 Tokenization: The splitting up of a stream of text into words, phrases, signs, or other significant parts is called tokenization. (2 Stop words: Preceding or succeeding to processing natural language data, the words that are segregated is called stop words. (3 Porter stemming: The main utilization of this algorithm is as part of a phrase normalization development that is characteristically completed while setting up in rank recovery technique. (4 WordNet: The compilation of lexical data base for the English language is called as WordNet Based on Artificial Neural Networks, the core part of this study work extends n-gram proposed algorithm. All the phonemes, syllables, letters, words or base pair corresponds in accordance to the application. Future work extends the application of this same similarity measures in various other neural network algorithms to accomplish improved results.
Directory of Open Access Journals (Sweden)
Thomas Lefèvre
Full Text Available Cost containment policies and the need to satisfy patients' health needs and care expectations provide major challenges to healthcare systems. Identification of homogeneous groups in terms of healthcare utilisation could lead to a better understanding of how to adjust healthcare provision to society and patient needs.This study used data from the third wave of the SIRS cohort study, a representative, population-based, socio-epidemiological study set up in 2005 in the Paris metropolitan area, France. The data were analysed using a cross-sectional design. In 2010, 3000 individuals were interviewed in their homes. Non-conventional multivariate clustering techniques were used to determine homogeneous user groups in data. Multinomial models assessed a wide range of potential associations between user characteristics and their pattern of healthcare utilisation.We identified four distinct patterns of healthcare use. Patterns of consumption and the socio-demographic characteristics of users differed qualitatively and quantitatively between these four profiles. Extensive and intensive use by older, wealthier and unhealthier people contrasted with narrow and parsimonious use by younger, socially deprived people and immigrants. Rare, intermittent use by young healthy men contrasted with regular targeted use by healthy and wealthy women.The use of an original technique of massive multivariate analysis allowed us to characterise different types of healthcare users, both in terms of resource utilisation and socio-demographic variables. This method would merit replication in different populations and healthcare systems.
Vale, A
2007-01-01
We revisit the longstanding question of whether first brightest cluster galaxies are statistically drawn from the same distribution as other cluster galaxies or are "special", using the new non-parametric, empirically based model presented in Vale&Ostriker (2006) for associating galaxy luminosity with halo/subhalo masses. We introduce scatter in galaxy luminosity at fixed halo mass into this model, building a conditional luminosity function (CLF) by considering two possible models: a simple lognormal and a model based on the distribution of concentration in haloes of a given mass. We show that this model naturally allows an identification of halo/subhalo systems with groups and clusters of galaxies, giving rise to a clear central/satellite galaxy distinction. We then use these results to build up the dependence of brightest cluster galaxy (BCG) magnitudes on cluster luminosity, focusing on two statistical indicators, the dispersion in BCG magnitude and the magnitude difference between first and second bri...
Thong Kieu, Duy; Kepic, Anton
2015-04-01
Geophysical inversion produces very useful images of earth parameters; however, inversion results usually suffer from inherent non-uniqueness: many subsurface models with different structures and parameters can explain the measurements. To reduce the ambiguity, extra information about the earth's structure and physical properties is needed. This prior information can be extracted from geological principles, prior petrophysical information from well logs, and complementary information from other geophysical methods. Any technique used to constrain inversion should be able to integrate the prior information and to guide updating inversion process in terms of the geological model. In this research, we have adopted fuzzy c-means (FCM) clustering technique for this purpose. FCM is a clustering method that allows us to divide the model of physical parameters into a few clusters of representative values that also may relate to geological units based on the similarity of the geophysical properties. This exploits the fact that in many geological environments the earth is comprised of a few distinctive rock units with different physical properties. Therefore FCM can provide a platform to constrain geophysical inversion, and should tend to produce models that are geologically meaningful. FCM was incorporated in both separate and co-operative inversion processing of seismic and magnetotelluric (MT) data with petrophysical constraints. Using petrophysical information through FCM assists the inversion to build a reliable earth model. In this algorithm, FCM plays a role of guider; it uses the prior information to drive the model update process, and also forming an earth model filled with rocks units rather than smooth transitions when the boundary is in doubt. Where petrophysical information from well logs or core measurement is not locally available the cluster petrophysics may be solved for in inversion as well if some knowledge of how many distinctive geological exist. A
Nonparametric analysis of the time structure of seismicity in a geographic region
Directory of Open Access Journals (Sweden)
A. Quintela-del-Río
2002-06-01
Full Text Available As an alternative to traditional parametric approaches, we suggest nonparametric methods for analyzing temporal data on earthquake occurrences. In particular, the kernel method for estimating the hazard function and the intensity function are presented. One novelty of our approaches is that we take into account the possible dependence of the data to estimate the distribution of time intervals between earthquakes, which has not been considered in most statistics studies on seismicity. Kernel estimation of hazard function has been used to study the occurrence process of cluster centers (main shocks. Kernel intensity estimation, on the other hand, has helped to describe the occurrence process of cluster members (aftershocks. Similar studies in two geographic areas of Spain (Granada and Galicia have been carried out to illustrate the estimation methods suggested.
Clustering via Kernel Decomposition
DEFF Research Database (Denmark)
Have, Anna Szynkowiak; Girolami, Mark A.; Larsen, Jan
2006-01-01
Methods for spectral clustering have been proposed recently which rely on the eigenvalue decomposition of an affinity matrix. In this work it is proposed that the affinity matrix is created based on the elements of a non-parametric density estimator. This matrix is then decomposed to obtain...... posterior probabilities of class membership using an appropriate form of nonnegative matrix factorization. The troublesome selection of hyperparameters such as kernel width and number of clusters can be obtained using standard cross-validation methods as is demonstrated on a number of diverse data sets....
DEFF Research Database (Denmark)
Warming, S; Ebbehøj, N E; Wiese, N;
2008-01-01
The aim of this study was to evaluate the effect of a transfer technique education programme (TT) alone or in combination with physical fitness training (TTPT) compared with a control group, who followed their usual routine. Eleven clinical hospital wards were cluster randomised to either...... intervention (six wards) or to control (five wards). The intervention cluster was individually randomised to TT (55 nurses) and TTPT (50 nurses), control (76 nurses). The transfer technique programme was a 4-d course of train-the-trainers to teach transfer technique to their colleagues. The physical training...... consisted of supervised physical fitness training 1 h twice per week for 8 weeks. Implementing transfer technique alone or in combination with physical fitness training among a hospital nursing staff did not, when compared to a control group, show any statistical differences according to self-reported low...
Asymptotic theory of nonparametric regression estimates with censored data
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
For regression analysis, some useful information may have been lost when the responses are right censored. To estimate nonparametric functions, several estimates based on censored data have been proposed and their consistency and convergence rates have been studied in literature, but the optimal rates of global convergence have not been obtained yet. Because of the possible information loss, one may think that it is impossible for an estimate based on censored data to achieve the optimal rates of global convergence for nonparametric regression, which were established by Stone based on complete data. This paper constructs a regression spline estimate of a general nonparametric regression function based on right_censored response data, and proves, under some regularity conditions, that this estimate achieves the optimal rates of global convergence for nonparametric regression. Since the parameters for the nonparametric regression estimate have to be chosen based on a data driven criterion, we also obtain the asymptotic optimality of AIC, AICC, GCV, Cp and FPE criteria in the process of selecting the parameters.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb-Douglas and......We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs...... rejects both the Cobb-Douglas and the Translog functional form, while a recently developed nonparametric kernel regression method with a fully nonparametric panel data specification delivers plausible results. On average, the nonparametric regression results are similar to results that are obtained from...
Ishii, Hideyuki; Nagano, Yasuharu; Ida, Hirofumi; Fukubayashi, Toru; Maruyama, Takeo
2011-07-07
The differences between the assessments performed with and without the point cluster technique (PCT) for knee joint motions during the high-risk movements associated with non-contact anterior cruciate ligament (ACL) injuries have not been reported. This study aims to examine the differences between PCT and non-PCT assessments for knee joint angles and moments during shuttle run cutting. Fourteen high school athletes performed a maximal effort shuttle run cutting task. Motion data were collected by an 8-camera motion analysis system at 200 Hz, and ground reaction force data were recorded using a force plate at 1000 Hz. In both PCT and non-PCT approaches, the knee joint angles were calculated using Euler angle rotations, and the knee joint moments were obtained by solving the Newton-Euler equations using an inverse dynamics technique. For the extension/flexion angle, good agreement was measured between PCT and non-PCT assessments. The abduction angle obtained in the non-PCT assessment was smaller than that obtained with the PCT. An internal rotation angle was obtained in the PCT assessment, whereas a small external rotation angle was obtained in the non-PCT assessment. For the knee joint moments, good agreement between PCT and non-PCT assessments was observed for all the components. The differences in the knee joint angles were attributed in part to the differences in the position of the medial femoral epicondyle. The results suggest that the ACL injury risk during shuttle run cutting is estimated lower in the non-PCT assessment than in the PCT assessment.
Directory of Open Access Journals (Sweden)
S. Grimald
2012-03-01
Full Text Available Knowledge of the inner magnetospheric current system (intensity, boundaries, evolution is one of the key elements for the understanding of the whole magnetospheric current system. In particular, the calculation of the current density and the study of the changes in the ring current is an active field of research as it is a good proxy for the magnetic activity. The curlometer technique allows the current density to be calculated from the magnetic field measured at four different positions inside a given current sheet using the Maxwell-Ampere's law. In 2009, the CLUSTER perigee pass was located at about 2 R_{E} allowing a study of the ring current deep inside the inner magnetosphere, where the pressure gradient is expected to invert direction. In this paper, we use the curlometer in such an orbit. As the method has never been used so deep inside the inner magnetosphere, this study is a test of the curlometer in a part of the magnetosphere where the magnetic field is very high (about 4000 nT and changes over small distances (ΔB = 1nT in 1000 km. To do so, the curlometer has been applied to calculate the current density from measured and modelled magnetic fields and for different sizes of the tetrahedron. The results show that the current density cannot be calculated using the curlometer technique at low altitude perigee passes, but that the method may be accurate in a [3 R_{E}; 5 R_{E}] or a [6 R_{E}; 8.3 R_{E}] L-shell range. It also demonstrates that the parameters used to estimate the accuracy of the method are necessary, but not sufficient conditions.
Rosa, Pedro J; Morais, Diogo; Gamito, Pedro; Oliveira, Jorge; Saraiva, Tomaz
2016-03-01
Immersive virtual reality is thought to be advantageous by leading to higher levels of presence. However, and despite users getting actively involved in immersive three-dimensional virtual environments that incorporate sound and motion, there are individual factors, such as age, video game knowledge, and the predisposition to immersion, that may be associated with the quality of virtual reality experience. Moreover, one particular concern for users engaged in immersive virtual reality environments (VREs) is the possibility of side effects, such as cybersickness. The literature suggests that at least 60% of virtual reality users report having felt symptoms of cybersickness, which reduces the quality of the virtual reality experience. The aim of this study was thus to profile the right user to be involved in a VRE through head-mounted display. To examine which user characteristics are associated with the most effective virtual reality experience (lower cybersickness), a multiple correspondence analysis combined with cluster analysis technique was performed. Results revealed three distinct profiles, showing that the PC gamer profile is more associated with higher levels of virtual reality effectiveness, that is, higher predisposition to be immersed and reduced cybersickness symptoms in the VRE than console gamer and nongamer. These findings can be a useful orientation in clinical practice and future research as they help identify which users are more predisposed to benefit from immersive VREs.
Barbui, M.; Hagel, K.; Gauthier, J.; Wuenschel, S.; Goldberg, V. Z.; Zheng, H.; Giuliani, G.; Rapisarda, G.; Kim, E.-J.; Liu, X.; Natowitz, J. B.; Desouza, R. T.; Hudan, S.; Fang, D.
2015-10-01
Searching for alpha cluster states analogous to the 12C Hoyle state in heavier alpha-conjugate nuclei can provide tests of the existence of alpha condensates in nuclear matter. Such states are predicted for 16O, 20Ne, 24Mg, etc. at excitation energies slightly above the decay threshold. The Thick Target Inverse Kinematics (TTIK) technique can be successfully used to study the breakup of excited self-conjugate nuclei into many alpha particles. The reaction 20Ne + α at 11 and 13 AMeV was studied at Cyclotron Institute at Texas A&M University. Here the TTIK method was used to study both single α-particle emission and multiple α-particle decays. Due to the limited statistics, only events with alpha multiplicity up to three were analyzed. The analysis of the three α-particle emission data allowed the identification of the Hoyle state and other 12C excited states decaying into three alpha particles. The results will be shown and compared with other data available in the literature. Another experiment is planned in August 2015 to study the system 28Si + α at 15 AMeV. Preliminary results will be shown. Supported by the U.S. DOE and the Robert A. Welch Foundation, Grant No. A0330.
Bibi, Humera; Alam, Khan; Bibi, Samina
2016-11-01
Discrimination of aerosol types is essential over the Indo-Gangetic plain (IGP) because several aerosol types originate from different sources having different atmospheric impacts. In this paper, we analyzed a seasonal discrimination of aerosol types by multiple clustering techniques using AERosol RObotic NETwork (AERONET) datasets for the period 2007-2013 over Karachi, Lahore, Jaipur and Kanpur. We discriminated the aerosols into three major types; dust, biomass burning and urban/industrial. The discrimination was carried out by analyzing different aerosol optical properties such as Aerosol Optical Depth (AOD), Angstrom Exponent (AE), Extinction Angstrom Exponent (EAE), Abortion Angstrom Exponent (AAE), Single Scattering Albedo (SSA) and Real Refractive Index (RRI) and their interrelationship to investigate the dominant aerosol types and to examine the variation in their seasonal distribution. The results revealed that during summer and pre-monsoon, dust aerosols were dominant while during winter and post-monsoon prevailing aerosols were biomass burning and urban industrial, and the mixed type of aerosols were present in all seasons. These types of aerosol discriminated from AERONET were in good agreement with CALIPSO (the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) measurement.
Institute of Scientific and Technical Information of China (English)
H.SAGHI; M.J.KETABDARI; S.BOOSHI
2012-01-01
A two-dimensional (2D) numerical model is developed for the wave simulation and propagation in a wave flume.The fluid flow is assumed to be viscous and incompressible,and the Navier-Stokes and continuity equations are used as the governing equations.The standard κ-ε model is used to model the turbulent flow.The NavierStokes equations are discretized using the staggered grid finite difference method and solved by the simplified marker and cell (SMAC) method. Waves are generated and propagated using a piston type wave maker. An open boundary condition is used at the end of the numerical flume.Some standard tests,such as the lid-driven cavity,the constant unidirectional velocity field,the shearing flow,and the dam-break on the dry bed,are performed to valid the model.To demonstrate the capability and accuracy of the present method,the results of generated waves are compared with available wave theories.Finally,the clustering technique (CT) is used for the mesh generation,and the best condition is suggested.
Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling.
Karsch, Kevin; Liu, Ce; Kang, Sing Bing
2014-11-01
We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large data set containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Directory of Open Access Journals (Sweden)
Saerom Park
Full Text Available Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Nonparametric estimation of a convex bathtub-shaped hazard function.
Jankowski, Hanna K; Wellner, Jon A
2009-11-01
In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n(2/5) at points x(0) where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.
Nonparametric Inference for the Cosmic Microwave Background
Genovese, C R; Nichol, R C; Arjunwadkar, M; Wasserman, L; Genovese, Christopher R.; Miller, Christopher J.; Nichol, Robert C.; Arjunwadkar, Mihir; Wasserman, Larry
2004-01-01
The Cosmic Microwave Background (CMB), which permeates the entire Universe, is the radiation left over from just 380,000 years after the Big Bang. On very large scales, the CMB radiation field is smooth and isotropic, but the existence of structure in the Universe - stars, galaxies, clusters of galaxies - suggests that the field should fluctuate on smaller scales. Recent observations, from the Cosmic Microwave Background Explorer to the Wilkinson Microwave Anisotropy Project, have strikingly confirmed this prediction. CMB fluctuations provide clues to the Universe's structure and composition shortly after the Big Bang that are critical for testing cosmological models. For example, CMB data can be used to determine what portion of the Universe is composed of ordinary matter versus the mysterious dark matter and dark energy. To this end, cosmologists usually summarize the fluctuations by the power spectrum, which gives the variance as a function of angular frequency. The spectrum's shape, and in particular the ...
Srivastava, Ashok, N.; Akella, Ram; Diev, Vesselin; Kumaresan, Sakthi Preethi; McIntosh, Dawn M.; Pontikakis, Emmanuel D.; Xu, Zuobing; Zhang, Yi
2006-01-01
This paper describes the results of a significant research and development effort conducted at NASA Ames Research Center to develop new text mining techniques to discover anomalies in free-text reports regarding system health and safety of two aerospace systems. We discuss two problems of significant importance in the aviation industry. The first problem is that of automatic anomaly discovery about an aerospace system through the analysis of tens of thousands of free-text problem reports that are written about the system. The second problem that we address is that of automatic discovery of recurring anomalies, i.e., anomalies that may be described m different ways by different authors, at varying times and under varying conditions, but that are truly about the same part of the system. The intent of recurring anomaly identification is to determine project or system weakness or high-risk issues. The discovery of recurring anomalies is a key goal in building safe, reliable, and cost-effective aerospace systems. We address the anomaly discovery problem on thousands of free-text reports using two strategies: (1) as an unsupervised learning problem where an algorithm takes free-text reports as input and automatically groups them into different bins, where each bin corresponds to a different unknown anomaly category; and (2) as a supervised learning problem where the algorithm classifies the free-text reports into one of a number of known anomaly categories. We then discuss the application of these methods to the problem of discovering recurring anomalies. In fact the special nature of recurring anomalies (very small cluster sizes) requires incorporating new methods and measures to enhance the original approach for anomaly detection. ?& pant 0-
Non-parametric partitioning of SAR images
Delyon, G.; Galland, F.; Réfrégier, Ph.
2006-09-01
We describe and analyse a generalization of a parametric segmentation technique adapted to Gamma distributed SAR images to a simple non parametric noise model. The partition is obtained by minimizing the stochastic complexity of a quantized version on Q levels of the SAR image and lead to a criterion without parameters to be tuned by the user. We analyse the reliability of the proposed approach on synthetic images. The quality of the obtained partition will be studied for different possible strategies. In particular, one will discuss the reliability of the proposed optimization procedure. Finally, we will precisely study the performance of the proposed approach in comparison with the statistical parametric technique adapted to Gamma noise. These studies will be led by analyzing the number of misclassified pixels, the standard Hausdorff distance and the number of estimated regions.
Nonparametric Cointegration Analysis of Fractional Systems With Unknown Integration Orders
DEFF Research Database (Denmark)
Nielsen, Morten Ørregaard
2009-01-01
In this paper a nonparametric variance ratio testing approach is proposed for determining the number of cointegrating relations in fractionally integrated systems. The test statistic is easily calculated without prior knowledge of the integration order of the data, the strength of the cointegrating...
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move b...... but that this dependence vanishes after 2-3 years....
A non-parametric model for the cosmic velocity field
Branchini, E; Teodoro, L; Frenk, CS; Schmoldt, [No Value; Efstathiou, G; White, SDM; Saunders, W; Sutherland, W; Rowan-Robinson, M; Keeble, O; Tadros, H; Maddox, S; Oliver, S
1999-01-01
We present a self-consistent non-parametric model of the local cosmic velocity field derived from the distribution of IRAS galaxies in the PSCz redshift survey. The survey has been analysed using two independent methods, both based on the assumptions of gravitational instability and linear biasing.
Influence of test and person characteristics on nonparametric appropriateness measurement
Meijer, Rob R.; Molenaar, Ivo W.; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement
Meijer, Rob R; Molenaar, Ivo W; Sijtsma, Klaas
1994-01-01
Appropriateness measurement in nonparametric item response theory modeling is affected by the reliability of the items, the test length, the type of aberrant response behavior, and the percentage of aberrant persons in the group. The percentage of simulees defined a priori as aberrant responders tha
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects
Qian, Minghui; Hu, Ridong; Chen, Jianwei
2016-01-01
Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Uniform Consistency for Nonparametric Estimators in Null Recurrent Time Series
DEFF Research Database (Denmark)
Gao, Jiti; Kanaya, Shin; Li, Degui
2015-01-01
This paper establishes uniform consistency results for nonparametric kernel density and regression estimators when time series regressors concerned are nonstationary null recurrent Markov chains. Under suitable regularity conditions, we derive uniform convergence rates of the estimators. Our...... results can be viewed as a nonstationary extension of some well-known uniform consistency results for stationary time series....
Non-parametric Bayesian inference for inhomogeneous Markov point processes
DEFF Research Database (Denmark)
Berthelsen, Kasper Klitgaard; Møller, Jesper
With reference to a specific data set, we consider how to perform a flexible non-parametric Bayesian analysis of an inhomogeneous point pattern modelled by a Markov point process, with a location dependent first order term and pairwise interaction only. A priori we assume that the first order term...
Coverage Accuracy of Confidence Intervals in Nonparametric Regression
Institute of Scientific and Technical Information of China (English)
Song-xi Chen; Yong-song Qin
2003-01-01
Point-wise confidence intervals for a nonparametric regression function with random design points are considered. The confidence intervals are those based on the traditional normal approximation and the empirical likelihood. Their coverage accuracy is assessed by developing the Edgeworth expansions for the coverage probabilities. It is shown that the empirical likelihood confidence intervals are Bartlett correctable.
Homothetic Efficiency and Test Power: A Non-Parametric Approach
J. Heufer (Jan); P. Hjertstrand (Per)
2015-01-01
markdownabstract__Abstract__ We provide a nonparametric revealed preference approach to demand analysis based on homothetic efficiency. Homotheticity is a useful restriction but data rarely satisfies testable conditions. To overcome this we provide a way to estimate homothetic efficiency of consump
Non-parametric analysis of rating transition and default data
DEFF Research Database (Denmark)
Fledelius, Peter; Lando, David; Perch Nielsen, Jens
2004-01-01
We demonstrate the use of non-parametric intensity estimation - including construction of pointwise confidence sets - for analyzing rating transition data. We find that transition intensities away from the class studied here for illustration strongly depend on the direction of the previous move...
Effect on Prediction when Modeling Covariates in Bayesian Nonparametric Models.
Cruz-Marcelo, Alejandro; Rosner, Gary L; Müller, Peter; Stewart, Clinton F
2013-04-01
In biomedical research, it is often of interest to characterize biologic processes giving rise to observations and to make predictions of future observations. Bayesian nonparametric methods provide a means for carrying out Bayesian inference making as few assumptions about restrictive parametric models as possible. There are several proposals in the literature for extending Bayesian nonparametric models to include dependence on covariates. Limited attention, however, has been directed to the following two aspects. In this article, we examine the effect on fitting and predictive performance of incorporating covariates in a class of Bayesian nonparametric models by one of two primary ways: either in the weights or in the locations of a discrete random probability measure. We show that different strategies for incorporating continuous covariates in Bayesian nonparametric models can result in big differences when used for prediction, even though they lead to otherwise similar posterior inferences. When one needs the predictive density, as in optimal design, and this density is a mixture, it is better to make the weights depend on the covariates. We demonstrate these points via a simulated data example and in an application in which one wants to determine the optimal dose of an anticancer drug used in pediatric oncology.
A Level Set Analysis and A Nonparametric Regression on S&P 500 Daily Return
Directory of Open Access Journals (Sweden)
Yipeng Yang
2016-02-01
Full Text Available In this paper, a level set analysis is proposed which aims to analyze the S&P 500 return with a certain magnitude. It is found that the process of large jumps/drops of return tend to have negative serial correlation, and volatility clustering phenomenon can be easily seen. Then, a nonparametric analysis is performed and new patterns are discovered. An ARCH model is constructed based on the patterns we discovered and it is capable of manifesting the volatility skew in option pricing. A comparison of our model with the GARCH(1,1 model is carried out. The explanation of the validity on our model through prospect theory is provided, and, as a novelty, we linked the volatility skew phenomenon to the prospect theory in behavioral finance.
Directory of Open Access Journals (Sweden)
Akhtar R. Siddique
2000-03-01
Full Text Available This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices. This paper develops a filtering-based framework of non-parametric estimation of parameters of a diffusion process from the conditional moments of discrete observations of the process. This method is implemented for interest rate data in the Eurodollar and long term bond markets. The resulting estimates are then used to form non-parametric univariate and bivariate interest rate models and compute prices for the short term Eurodollar interest rate futures options and long term discount bonds. The bivariate model produces prices substantially closer to the market prices.
Ford, Eric B.; Fabrycky, Daniel C.; Steffen, Jason H.; Carter, Joshua A.; Fressin, Francois; Holman, Matthew Jon; Lissauer, Jack J.; Moorhead, Althea V.; Morehead, Robert C.; Ragozzine, Darin; Rowe, Jason F.; Welsh, William F.; Allen, Christopher; Batalha, Natalie M.; Borucki, William J.
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data se...
Fast pixel-based optical proximity correction based on nonparametric kernel regression
Ma, Xu; Wu, Bingliang; Song, Zhiyang; Jiang, Shangliang; Li, Yanqiu
2014-10-01
Optical proximity correction (OPC) is a resolution enhancement technique extensively used in the semiconductor industry to improve the resolution and pattern fidelity of optical lithography. In pixel-based OPC (PBOPC), the layout is divided into small pixels, which are then iteratively modified until the simulated print image on the wafer matches the desired pattern. However, the increasing complexity and size of modern integrated circuits make PBOPC techniques quite computationally intensive. This paper focuses on developing a practical and efficient PBOPC algorithm based on a nonparametric kernel regression, a well-known technique in machine learning. Specifically, we estimate the OPC patterns based on the geometric characteristics of the original layout corresponding to the same region and a series of training examples. Experimental results on metal layers show that our proposed approach significantly improves the speed of a current professional PBOPC software by a factor of 2 to 3, and may further reduce the mask complexity.
Akhmadaliev, S Z; Ambrosini, G; Amorim, A; Anderson, K; Andrieux, M L; Aubert, Bernard; Augé, E; Badaud, F; Baisin, L; Barreiro, F; Battistoni, G; Bazan, A; Bazizi, K; Belymam, A; Benchekroun, D; Berglund, S R; Berset, J C; Blanchot, G; Bogush, A A; Bohm, C; Boldea, V; Bonivento, W; Bosman, M; Bouhemaid, N; Breton, D; Brette, P; Bromberg, C; Budagov, Yu A; Burdin, S V; Calôba, L P; Camarena, F; Camin, D V; Canton, B; Caprini, M; Carvalho, J; Casado, M P; Castillo, M V; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Chadelas, R; Chalifour, M; Chekhtman, A; Chevalley, J L; Chirikov-Zorin, I E; Chlachidze, G; Citterio, M; Cleland, W E; Clément, C; Cobal, M; Cogswell, F; Colas, Jacques; Collot, J; Cologna, S; Constantinescu, S; Costa, G; Costanzo, D; Crouau, M; Daudon, F; David, J; David, M; Davidek, T; Dawson, J; De, K; de La Taille, C; Del Peso, J; Del Prete, T; de Saintignon, P; Di Girolamo, B; Dinkespiler, B; Dita, S; Dodd, J; Dolejsi, J; Dolezal, Z; Downing, R; Dugne, J J; Dzahini, D; Efthymiopoulos, I; Errede, D; Errede, S; Evans, H; Eynard, G; Fassi, F; Fassnacht, P; Ferrari, A; Ferrer, A; Flaminio, Vincenzo; Fournier, D; Fumagalli, G; Gallas, E; Gaspar, M; Giakoumopoulou, V; Gianotti, F; Gildemeister, O; Giokaris, N; Glagolev, V; Glebov, V Yu; Gomes, A; González, V; González de la Hoz, S; Grabskii, V; Graugès-Pous, E; Grenier, P; Hakopian, H H; Haney, M; Hébrard, C; Henriques, A; Hervás, L; Higón, E; Holmgren, Sven Olof; Hostachy, J Y; Hoummada, A; Huston, J; Imbault, D; Ivanyushenkov, Yu M; Jézéquel, S; Johansson, E K; Jon-And, K; Jones, R; Juste, A; Kakurin, S; Karyukhin, A N; Khokhlov, Yu A; Khubua, J I; Klioukhine, V I; Kolachev, G M; Kopikov, S V; Kostrikov, M E; Kozlov, V; Krivkova, P; Kukhtin, V V; Kulagin, M; Kulchitskii, Yu A; Kuzmin, M V; Labarga, L; Laborie, G; Lacour, D; Laforge, B; Lami, S; Lapin, V; Le Dortz, O; Lefebvre, M; Le Flour, T; Leitner, R; Leltchouk, M; Li, J; Liablin, M V; Linossier, O; Lissauer, D; Lobkowicz, F; Lokajícek, M; Lomakin, Yu F; López-Amengual, J M; Lund-Jensen, B; Maio, A; Makowiecki, D S; Malyukov, S N; Mandelli, L; Mansoulié, B; Mapelli, Livio P; Marin, C P; Marrocchesi, P S; Marroquim, F; Martin, P; Maslennikov, A L; Massol, N; Mataix, L; Mazzanti, M; Mazzoni, E; Merritt, F S; Michel, B; Miller, R; Minashvili, I A; Miralles, L; Mnatzakanian, E A; Monnier, E; Montarou, G; Mornacchi, Giuseppe; Moynot, M; Muanza, G S; Nayman, P; Némécek, S; Nessi, Marzio; Nicoleau, S; Niculescu, M; Noppe, J M; Onofre, A; Pallin, D; Pantea, D; Paoletti, R; Park, I C; Parrour, G; Parsons, J; Pereira, A; Perini, L; Perlas, J A; Perrodo, P; Pilcher, J E; Pinhão, J; Plothow-Besch, Hartmute; Poggioli, Luc; Poirot, S; Price, L; Protopopov, Yu; Proudfoot, J; Puzo, P; Radeka, V; Rahm, David Charles; Reinmuth, G; Renzoni, G; Rescia, S; Resconi, S; Richards, R; Richer, J P; Roda, C; Rodier, S; Roldán, J; Romance, J B; Romanov, V; Romero, P; Rossel, F; Rusakovitch, N A; Sala, P; Sanchis, E; Sanders, H; Santoni, C; Santos, J; Sauvage, D; Sauvage, G; Sawyer, L; Says, L P; Schaffer, A C; Schwemling, P; Schwindling, J; Seguin-Moreau, N; Seidl, W; Seixas, J M; Selldén, B; Seman, M; Semenov, A; Serin, L; Shaldaev, E; Shochet, M J; Sidorov, V; Silva, J; Simaitis, V J; Simion, S; Sissakian, A N; Snopkov, R; Söderqvist, J; Solodkov, A A; Soloviev, A; Soloviev, I V; Sonderegger, P; Soustruznik, K; Spanó, F; Spiwoks, R; Stanek, R; Starchenko, E A; Stavina, P; Stephens, R; Suk, M; Surkov, A; Sykora, I; Takai, H; Tang, F; Tardell, S; Tartarelli, F; Tas, P; Teiger, J; Thaler, J; Thion, J; Tikhonov, Yu A; Tisserant, S; Tokar, S; Topilin, N D; Trka, Z; Turcotte, M; Valkár, S; Varanda, M J; Vartapetian, A H; Vazeille, F; Vichou, I; Vinogradov, V; Vorozhtsov, S B; Vuillemin, V; White, A; Wielers, M; Wingerter-Seez, I; Wolters, H; Yamdagni, N; Yosef, C; Zaitsev, A; Zitoun, R; Zolnierowski, Y
2002-01-01
This paper discusses hadron energy reconstruction for the ATLAS barrel prototype combined calorimeter (consisting of a lead-liquid argon electromagnetic part and an iron-scintillator hadronic part) in the framework of the nonparametrical method. The nonparametrical method utilizes only the known e/h ratios and the electron calibration constants and does not require the determination of any parameters by a minimization technique. Thus, this technique lends itself to an easy use in a first level trigger. The reconstructed mean values of the hadron energies are within +or-1% of the true values and the fractional energy resolution is [(58+or-3)%/ square root E+(2.5+or-0.3)%](+)(1.7+or-0.2)/E. The value of the e/h ratio obtained for the electromagnetic compartment of the combined calorimeter is 1.74+or-0.04 and agrees with the prediction that e/h >1.66 for this electromagnetic calorimeter. Results of a study of the longitudinal hadronic shower development are also presented. The data have been taken in the H8 beam...
Kim, Junmo; Fisher, John W; Yezzi, Anthony; Cetin, Müjdat; Willsky, Alan S
2005-10-01
In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Futhermore, our method, which does not require any training, performs as good as methods based on training.
Silva-Villa, E
2010-01-01
It is generally assumed that a large fraction of stars are initially born in clusters. However, a large fraction of these disrupt on short timescales and the stars end up belonging to the field. Understanding this process is of paramount importance if we wish to constrain the star formation histories of external galaxies using star clusters. We attempt to understand the relation between field stars and star clusters by simultaneously studying both in a number of nearby galaxies. As a pilot study, we present results for the late-type spiral NGC 4395 using HST/ACS and HST/WFPC2 images. Different detection criteria were used to distinguish point sources (star candidates) and extended objects (star cluster candidates). Using a synthetic CMD method, we estimated the star formation history. Using simple stellar population model fitting, we calculated the mass and age of the cluster candidates. The field star formation rate appears to have been roughly constant, or to have possibly increased by up to about a factor ...
Directory of Open Access Journals (Sweden)
D. Das
2014-04-01
Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.
Non-parametric mass reconstruction of A1689 from strong lensing data with SLAP
Diego-Rodriguez, J M; Protopapas, P; Tegmark, M; Benítez, N; Broadhurst, T J
2004-01-01
We present the mass distribution in the central area of the cluster A1689 by fitting over 100 multiply lensed images with the non-parametric Strong Lensing Analysis Package (SLAP, Diego et al. 2004). The surface mass distribution is obtained in a robust way finding a total mass of 0.25E15 M_sun/h within a 70'' circle radius from the central peak. Our reconstructed density profile fits well an NFW profile with small perturbations due to substructure and is compatible with the more model dependent analysis of Broadhurst et al. (2004a) based on the same data. Our estimated mass does not rely on any prior information about the distribution of dark matter in the cluster. The peak of the mass distribution falls very close to the central cD and there is substructure near the center suggesting that the cluster is not fully relaxed. We also examine the effect on the recovered mass when we include the uncertainties in the redshift of the sources and in the original shape of the sources. Using simulations designed to mi...
Directory of Open Access Journals (Sweden)
Saswati Mukherjee
2010-11-01
Full Text Available Due to application of WSN in mission critical areas, secured message communication is veryimportant. We have attempted to present a methodology that ensures secured communicationamong nodes in a hierarchical Cluster Based WSN. Our scheme works when member sensornodes move from one Cluster Head (CH to another. The proposed scheme is based on Key Redistributionduring node mobility and development of an Authentication Model to checkwhether the new node in a cluster is an intruder. We have carried out extensive simulationexperiments, which demonstrate the efficacy of the proposed scheme. The experiments suggestthat the number of message transmission in creases linearly with the number of mobile nodesduring key-redistribution when a node moves from one CH to another. We have seen that thedetection efficiency of the Authentication Model is 0.9 to 1 when tunable threshold value is 0.02and sensor nodes are sufficiently mobile.
Nonparametric inference procedures for multistate life table analysis.
Dow, M M
1985-01-01
Recent generalizations of the classical single state life table procedures to the multistate case provide the means to analyze simultaneously the mobility and mortality experience of 1 or more cohorts. This paper examines fairly general nonparametric combinatorial matrix procedures, known as quadratic assignment, as an analysis technic of various transitional patterns commonly generated by cohorts over the life cycle course. To some degree, the output from a multistate life table analysis suggests inference procedures. In his discussion of multstate life table construction features, the author focuses on the matrix formulation of the problem. He then presents several examples of the proposed nonparametric procedures. Data for the mobility and life expectancies at birth matrices come from the 458 member Cayo Santiago rhesus monkey colony. The author's matrix combinatorial approach to hypotheses testing may prove to be a useful inferential strategy in several multidimensional demographic areas.
Non-parametric estimation of Fisher information from real data
Shemesh, Omri Har; Miñano, Borja; Hoekstra, Alfons G; Sloot, Peter M A
2015-01-01
The Fisher Information matrix is a widely used measure for applications ranging from statistical inference, information geometry, experiment design, to the study of criticality in biological systems. Yet there is no commonly accepted non-parametric algorithm to estimate it from real data. In this rapid communication we show how to accurately estimate the Fisher information in a nonparametric way. We also develop a numerical procedure to minimize the errors by choosing the interval of the finite difference scheme necessary to compute the derivatives in the definition of the Fisher information. Our method uses the recently published "Density Estimation using Field Theory" algorithm to compute the probability density functions for continuous densities. We use the Fisher information of the normal distribution to validate our method and as an example we compute the temperature component of the Fisher Information Matrix in the two dimensional Ising model and show that it obeys the expected relation to the heat capa...
International Conference on Robust Rank-Based and Nonparametric Methods
McKean, Joseph
2016-01-01
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Nonparametric instrumental regression with non-convex constraints
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Estimation of Stochastic Volatility Models by Nonparametric Filtering
DEFF Research Database (Denmark)
Kanaya, Shin; Kristensen, Dennis
2016-01-01
/estimated volatility process replacing the latent process. Our estimation strategy is applicable to both parametric and nonparametric stochastic volatility models, and can handle both jumps and market microstructure noise. The resulting estimators of the stochastic volatility model will carry additional biases......A two-step estimation method of stochastic volatility models is proposed: In the first step, we nonparametrically estimate the (unobserved) instantaneous volatility process. In the second step, standard estimation methods for fully observed diffusion processes are employed, but with the filtered...... and variances due to the first-step estimation, but under regularity conditions we show that these vanish asymptotically and our estimators inherit the asymptotic properties of the infeasible estimators based on observations of the volatility process. A simulation study examines the finite-sample properties...
Nonparametric Regression Estimation for Multivariate Null Recurrent Processes
Directory of Open Access Journals (Sweden)
Biqing Cai
2015-04-01
Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.
Right-Censored Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Dursun Aydın
2016-11-01
Full Text Available This paper introduces the operating of the selection criteria for right-censored nonparametric regression using smoothing spline. In order to transform the response variable into a variable that contains the right-censorship, we used the KaplanMeier weights proposed by [1], and [2]. The major problem in smoothing spline method is to determine a smoothing parameter to obtain nonparametric estimates of the regression function. In this study, the mentioned parameter is chosen based on censored data by means of the criteria such as improved Akaike information criterion (AICc, Bayesian (or Schwarz information criterion (BIC and generalized crossvalidation (GCV. For this purpose, a Monte-Carlo simulation study is carried out to illustrate which selection criterion gives the best estimation for censored data.
Poverty and life cycle effects: A nonparametric analysis for Germany
Stich, Andreas
1996-01-01
Most empirical studies on poverty consider the extent of poverty either for the entire society or for separate groups like elderly people.However, these papers do not show what the situation looks like for persons of a certain age. In this paper poverty measures depending on age are derived using the joint density of income and age. The density is nonparametrically estimated by weighted Gaussian kernel density estimation. Applying the conditional density of income to several poverty measures ...
Nonparametric estimation of Fisher information from real data
Har-Shemesh, Omri; Quax, Rick; Miñano, Borja; Hoekstra, Alfons G.; Sloot, Peter M. A.
2016-02-01
The Fisher information matrix (FIM) is a widely used measure for applications including statistical inference, information geometry, experiment design, and the study of criticality in biological systems. The FIM is defined for a parametric family of probability distributions and its estimation from data follows one of two paths: either the distribution is assumed to be known and the parameters are estimated from the data or the parameters are known and the distribution is estimated from the data. We consider the latter case which is applicable, for example, to experiments where the parameters are controlled by the experimenter and a complicated relation exists between the input parameters and the resulting distribution of the data. Since we assume that the distribution is unknown, we use a nonparametric density estimation on the data and then compute the FIM directly from that estimate using a finite-difference approximation to estimate the derivatives in its definition. The accuracy of the estimate depends on both the method of nonparametric estimation and the difference Δ θ between the densities used in the finite-difference formula. We develop an approach for choosing the optimal parameter difference Δ θ based on large deviations theory and compare two nonparametric density estimation methods, the Gaussian kernel density estimator and a novel density estimation using field theory method. We also compare these two methods to a recently published approach that circumvents the need for density estimation by estimating a nonparametric f divergence and using it to approximate the FIM. We use the Fisher information of the normal distribution to validate our method and as a more involved example we compute the temperature component of the FIM in the two-dimensional Ising model and show that it obeys the expected relation to the heat capacity and therefore peaks at the phase transition at the correct critical temperature.
ANALYSIS OF TIED DATA: AN ALTERNATIVE NON-PARAMETRIC APPROACH
Directory of Open Access Journals (Sweden)
I. C. A. OYEKA
2012-02-01
Full Text Available This paper presents a non-parametric statistical method of analyzing two-sample data that makes provision for the possibility of ties in the data. A test statistic is developed and shown to be free of the effect of any possible ties in the data. An illustrative example is provided and the method is shown to compare favourably with its competitor; the Mann-Whitney test and is more powerful than the latter when there are ties.
Nonparametric test for detecting change in distribution with panel data
Pommeret, Denys; Ghattas, Badih
2011-01-01
This paper considers the problem of comparing two processes with panel data. A nonparametric test is proposed for detecting a monotone change in the link between the two process distributions. The test statistic is of CUSUM type, based on the empirical distribution functions. The asymptotic distribution of the proposed statistic is derived and its finite sample property is examined by bootstrap procedures through Monte Carlo simulations.
A Bayesian nonparametric method for prediction in EST analysis
Directory of Open Access Journals (Sweden)
Prünster Igor
2007-09-01
Full Text Available Abstract Background Expressed sequence tags (ESTs analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b the number of new unique genes to be observed in a future sample; c the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.
Fusion of Hard and Soft Information in Nonparametric Density Estimation
2015-06-10
estimation exploiting, in concert, hard and soft information. Although our development, theoretical and numerical, makes no distinction based on sample...Fusion of Hard and Soft Information in Nonparametric Density Estimation∗ Johannes O. Royset Roger J-B Wets Department of Operations Research...univariate density estimation in situations when the sample ( hard information) is supplemented by “soft” information about the random phenomenon. These
Nonparametric estimation for hazard rate monotonously decreasing system
Institute of Scientific and Technical Information of China (English)
Han Fengyan; Li Weisong
2005-01-01
Estimation of density and hazard rate is very important to the reliability analysis of a system. In order to estimate the density and hazard rate of a hazard rate monotonously decreasing system, a new nonparametric estimator is put forward. The estimator is based on the kernel function method and optimum algorithm. Numerical experiment shows that the method is accurate enough and can be used in many cases.
Non-parametric versus parametric methods in environmental sciences
Directory of Open Access Journals (Sweden)
Muhammad Riaz
2016-01-01
Full Text Available This current report intends to highlight the importance of considering background assumptions required for the analysis of real datasets in different disciplines. We will provide comparative discussion of parametric methods (that depends on distributional assumptions (like normality relative to non-parametric methods (that are free from many distributional assumptions. We have chosen a real dataset from environmental sciences (one of the application areas. The findings may be extended to the other disciplines following the same spirit.
Everitt, Brian S; Leese, Morven; Stahl, Daniel
2011-01-01
Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons
Gieren, W P; Gomes, M J; Gieren, Wolfgang P.; Fouque, Pascal; Gomez, Matias
1997-01-01
We have obtained the radii and distances of 16 galactic Cepheids supposed to be members in open clusters or associations using the new optical and near-infrared calibrations of the surface brightness (Barnes-Evans) method given by Fouque & Gieren (1997). We discuss in detail possible systematic errors in our infrared solutions and conclude that the typical total uncertainty of the infrared distance and radius of a Cepheid is about 3 percent in both infrared solutions, provided that the data are of excellent quality and that the amplitude of the color curve used in the solution is larger than ~0.3 mag. We compare the adopted infrared distances of the Cepheid variables to the ZAMS-fitting distances of their supposed host clusters and associations and find an unweighted mean value of the distance ratio of 1.02 +- 0.04. A detailed discussion of the individual Cepheids shows that the uncertainty of the ZAMS-fitting distances varies considerably from cluster to cluster. We find clear evidence that four Cepheids...
Silva-Villa, E.; Larsen, S.S.
2010-01-01
Context. It is generally assumed that a large fraction of stars are initially born in clusters. However, a large fraction of these disrupt on short timescales and the stars end up belonging to the field. Understanding this process is of paramount importance if we wish to constrain the star formation
Lee, Yu-Fang; Kelterer, Anne-Marie; Matisz, Gergely; Kunsági-Máté, Sándor; Chung, Chao-Yu; Lee, Yuan-Pern
2017-04-01
We recorded infrared (IR) spectra in the CH- and OH-stretching regions of size-selected clusters of methanol (M) with one water molecule (W), represented as MnW, n = 1-4, in a pulsed supersonic jet using the photoionization/IR-depletion technique. Vacuum ultraviolet emission at 118 nm served as the source of ionization in a time-of-flight mass spectrometer to detect clusters MnW as protonated forms Mn-1WH+. The variations in intensities of Mn-1WH+ were monitored as the wavelength of the IR laser light was tuned across the range 2700-3800 cm-1. IR spectra of size-selected clusters were obtained on processing of the observed action spectra of the related cluster-ions according to a mechanism that takes into account the production and loss of each cluster due to IR photodissociation. Spectra of methanol-water clusters in the OH region show significant variations as the number of methanol molecules increases, whereas those in the CH region are similar for all clusters. Scaled harmonic vibrational wavenumbers and relative IR intensities predicted with the M06-2X/aug-cc-pVTZ method for the methanol-water clusters are consistent with our experimental results. For dimers, absorption bands of a structure WM with H2O as a hydrogen-bond donor were observed at 3570, 3682, and 3722 cm-1, whereas weak bands of MW with methanol as a hydrogen-bond donor were observed at 3611 and 3753 cm-1. For M2W, the free OH band of H2O was observed at 3721 cm-1, whereas a broad feature was deconvoluted to three bands near 3425, 3472, and 3536 cm-1, corresponding to the three hydrogen-bonded OH-stretching modes in a cyclic structure. For M3W, the free OH shifted to 3715 cm-1, and the hydrogen-bonded OH-stretching bands became much broader, with a weak feature near 3179 cm-1 corresponding to the symmetric OH-stretching mode of a cyclic structure. For M4W, the observed spectrum agrees unsatisfactorily with predictions for the most stable cyclic structure, indicating significant contributions from
Clustering of complex shaped data sets via Kohonen maps and mathematical morphology
Ferreira Costa, Jose A.; de Andrade Netto, Marcio L.
2001-03-01
Clustering is the process of discovering groups within the data, based on similarities, with a minimal, if any, knowledge of their structure. The self-organizing (or Kohonen) map (SOM) is one of the best known neural network algorithms. It has been widely studied as a software tool for visualization of high-dimensional data. Important features include information compression while preserving topological and metric relationship of the primary data items. Although Kohonen maps had been applied for clustering data, usually the researcher sets the number of neurons equal to the expected number of clusters, or manually segments a two-dimensional map using some a-priori knowledge of the data. This paper proposes techniques for automatic partitioning and labeling SOM networks in clusters of neurons that may be used to represent the data clusters. Mathematical morphology operations, such as watershed, are performed on the U-matrix, which is a neuron-distance image. The direct application of watershed leads to an oversegmented image. It is used markers to identify significant clusters and homotopy modification to suppress the others. Markers are automatically found by performing a multilevel scan of connected regions of the U-matrix. Each cluster of neurons is a sub-graph that defines, in the input space, complex and non-parametric geometries which approximately describes the shape of the clusters. The process of map partitioning is extended recursively. Each cluster of neurons gives rise to a new map, which are trained with the subset of data that were classified to it. The algorithm produces dynamically a hierarchical tree of maps, which explains the cluster's structure in levels of granularity. The distributed and multiple prototypes cluster representation enables the discoveries of clusters even in the case when we have two or more non-separable pattern classes.
Comparative Study of Parametric and Non-parametric Approaches in Fault Detection and Isolation
DEFF Research Database (Denmark)
Katebi, S.D.; Blanke, M.; Katebi, M.R.
This report describes a comparative study between two approaches to fault detection and isolation in dynamic systems. The first approach uses a parametric model of the system. The main components of such techniques are residual and signature generation for processing and analyzing. The second...... approach is non-parametric in the sense that the signature analysis is only dependent on the frequency or time domain information extracted directly from the input-output signals. Based on these approaches, two different fault monitoring schemes are developed where the feature extraction and fault decision...... algorithms employed are adopted from the template matching in pattern recognition. Extensive simulation studies are performed to demonstrate satisfactory performance of the proposed techniques. The advantages and disadvantages of each approach are discussed and analyzed....
Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution
He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun
2016-05-01
Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.
Non-parametric transformation for data correlation and integration: From theory to practice
Energy Technology Data Exchange (ETDEWEB)
Datta-Gupta, A.; Xue, Guoping; Lee, Sang Heon [Texas A& M Univ., College Station, TX (United States)
1997-08-01
The purpose of this paper is two-fold. First, we introduce the use of non-parametric transformations for correlating petrophysical data during reservoir characterization. Such transformations are completely data driven and do not require a priori functional relationship between response and predictor variables which is the case with traditional multiple regression. The transformations are very general, computationally efficient and can easily handle mixed data types for example, continuous variables such as porosity, permeability and categorical variables such as rock type, lithofacies. The power of the non-parametric transformation techniques for data correlation has been illustrated through synthetic and field examples. Second, we utilize these transformations to propose a two-stage approach for data integration during heterogeneity characterization. The principal advantages of our approach over traditional cokriging or cosimulation methods are: (1) it does not require a linear relationship between primary and secondary data, (2) it exploits the secondary information to its fullest potential by maximizing the correlation between the primary and secondary data, (3) it can be easily applied to cases where several types of secondary or soft data are involved, and (4) it significantly reduces variance function calculations and thus, greatly facilitates non-Gaussian cosimulation. We demonstrate the data integration procedure using synthetic and field examples. The field example involves estimation of pore-footage distribution using well data and multiple seismic attributes.
Montiel, Ariadna; Sendra, Irene; Escamilla-Rivera, Celia; Salzano, Vincenzo
2014-01-01
In this work we present a nonparametric approach, which works on minimal assumptions, to reconstruct the cosmic expansion of the Universe. We propose to combine a locally weighted scatterplot smoothing method and a simulation-extrapolation method. The first one (Loess) is a nonparametric approach that allows to obtain smoothed curves with no prior knowledge of the functional relationship between variables nor of the cosmological quantities. The second one (Simex) takes into account the effect of measurement errors on a variable via a simulation process. For the reconstructions we use as raw data the Union2.1 Type Ia Supernovae compilation, as well as recent Hubble parameter measurements. This work aims to illustrate the approach, which turns out to be a self-sufficient technique in the sense we do not have to choose anything by hand. We examine the details of the method, among them the amount of observational data needed to perform the locally weighted fit which will define the robustness of our reconstructio...
MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO
Energy Technology Data Exchange (ETDEWEB)
Jardel, John R.; Gebhardt, Karl [Department of Astronomy, The University of Texas, 2515 Speedway, Stop C1400, Austin, TX 78712-1205 (United States); Fabricius, Maximilian H.; Williams, Michael J. [Max-Planck Institut fuer extraterrestrische Physik, Giessenbachstrasse, D-85741 Garching bei Muenchen (Germany); Drory, Niv, E-mail: jardel@astro.as.utexas.edu [Instituto de Astronomia, Universidad Nacional Autonoma de Mexico, Avenida Universidad 3000, Ciudad Universitaria, C.P. 04510 Mexico D.F. (Mexico)
2013-02-15
We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Our non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.
Wagstaff, Kiri L.
2012-03-01
matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
苏里格气田丛式井组快速钻井技术%Faster Drilling Technique of Cluster Wells in Su Lige Gas Field
Institute of Scientific and Technical Information of China (English)
欧阳勇; 吴学升; 高云文; 黄占盈; 白明娜
2011-01-01
The unique feature of "low permeable sublayer, low porosity" in Su Iige leads to the low productivity , small well spacing, dense well network of single well, and it is located in the desert, the ecology environment is weak, which is more suitable to be developed with the cluster wells, but the long building cycle time, high drilling cost and something else elements influence the popularization of cluster wells. To achieve the strategic objective of low cost development in Su Lige gas field, it is very important to improve the drilling speed of cluster well. Aimed at the problem of cluster well drilling process in Su Lige, according to the technique research of the amount optimization of well cluster, construction of well cut plane, optimization of PDC drill bit and optimization of make up of string, a faster drilling technique scheme of cluster wells is generated and received the preferable effect in the well site.%苏里格气田“低孔、低渗”的特点决定了其单井产量低、井距小、井网密,而其地处沙漠,生态环境脆弱,更适宜采用丛式井组开发.但从式井施工周期长、钻井成本高等因素影响了从式井组的推广应用.为实现苏里气田低成本开发的战略目标,提高丛式井钻井速度就显得尤为重要.针对苏里格丛式井钻井过程存在的难题,通过井组数优化、井身剖面设计、PDC钻头的优选以及钻具组合的优化等技术研究,形成了一套苏里格气田丛式井组快速钻井的的技术方案,在现场实施中取得了较好的效果.
DEFF Research Database (Denmark)
Azri, Suhaibah; Ujang, Uznir; Rahman, Alias Abdul
2014-01-01
D geospatial data clustering to be used in the construction of 3D R-Tree and respectively could reduce the overlapping among nodes. The proposed method is tested on 3D urban dataset for the application of urban infill development. By using several cases of data updating operations such as building...... infill, building demolition and building modification, the proposed method indicates that the percentage of overlapping coverage among nodes is reduced compared with other existing approaches....
Shaban, M; Urban, B; El Saadi, A; Faisal, M
2010-08-01
The limited water resources of Egypt lead to widespread water-stress. Consequently, the use of marginal water sources, such as agricultural drainage waters, provides one of the national feasible solutions to the problem. However, the marginal quality of the drainage waters may restrict their use. The objective of this research is to develop a tool for planning and managing the reuse of agricultural drainage water for irrigation in the Nile Delta. This is achieved by classifying the pollution levels of drainage water into several categories using a statistical clustering approach that may ensure simple but accurate information about the pollution levels and water characteristics at any point within the drainage system. The derived clusters are then visualized by using a Geographical Information System (GIS) to draw thematic maps based on the entire Nile Delta, thus making GIS as a decision support system. The obtained maps may assist the decision makers in managing and controlling pollution in the Nile Delta regions. The clustering process also provides an effective overview of those spots in the Nile Delta where intensified monitoring activities are required. Consequently, the obtained results make a major contribution to the assessment and redesign of the Egyptian national water quality monitoring network.
Implicit Priors in Galaxy Cluster Mass and Scaling Relation Determinations
Mantz, A.; Allen, S. W.
2011-01-01
Deriving the total masses of galaxy clusters from observations of the intracluster medium (ICM) generally requires some prior information, in addition to the assumptions of hydrostatic equilibrium and spherical symmetry. Often, this information takes the form of particular parametrized functions used to describe the cluster gas density and temperature profiles. In this paper, we investigate the implicit priors on hydrostatic masses that result from this fully parametric approach, and the implications of such priors for scaling relations formed from those masses. We show that the application of such fully parametric models of the ICM naturally imposes a prior on the slopes of the derived scaling relations, favoring the self-similar model, and argue that this prior may be influential in practice. In contrast, this bias does not exist for techniques which adopt an explicit prior on the form of the mass profile but describe the ICM non-parametrically. Constraints on the slope of the cluster mass-temperature relation in the literature show a separation based the approach employed, with the results from fully parametric ICM modeling clustering nearer the self-similar value. Given that a primary goal of scaling relation analyses is to test the self-similar model, the application of methods subject to strong, implicit priors should be avoided. Alternative methods and best practices are discussed.
Heavy hitters via cluster-preserving clustering
DEFF Research Database (Denmark)
Larsen, Kasper Green; Nelson, Jelani; Nguyen, Huy L.
2016-01-01
, providing correctness whp. In fact, a simpler version of our algorithm for p = 1 in the strict turnstile model answers queries even faster than the "dyadic trick" by roughly a log n factor, dominating it in all regards. Our main innovation is an efficient reduction from the heavy hitters to a clustering...... problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a "cluster-preserving clustering" algorithm......, that partitions the graph into clusters with the promise of not destroying any original cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel combinatorial techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved...
a Multivariate Downscaling Model for Nonparametric Simulation of Daily Flows
Molina, J. M.; Ramirez, J. A.; Raff, D. A.
2011-12-01
A multivariate, stochastic nonparametric framework for stepwise disaggregation of seasonal runoff volumes to daily streamflow is presented. The downscaling process is conditional on volumes of spring runoff and large-scale ocean-atmosphere teleconnections and includes a two-level cascade scheme: seasonal-to-monthly disaggregation first followed by monthly-to-daily disaggregation. The non-parametric and assumption-free character of the framework allows consideration of the random nature and nonlinearities of daily flows, which parametric models are unable to account for adequately. This paper examines statistical links between decadal/interannual climatic variations in the Pacific Ocean and hydrologic variability in US northwest region, and includes a periodicity analysis of climate patterns to detect coherences of their cyclic behavior in the frequency domain. We explore the use of such relationships and selected signals (e.g., north Pacific gyre oscillation, southern oscillation, and Pacific decadal oscillation indices, NPGO, SOI and PDO, respectively) in the proposed data-driven framework by means of a combinatorial approach with the aim of simulating improved streamflow sequences when compared with disaggregated series generated from flows alone. A nearest neighbor time series bootstrapping approach is integrated with principal component analysis to resample from the empirical multivariate distribution. A volume-dependent scaling transformation is implemented to guarantee the summability condition. In addition, we present a new and simple algorithm, based on nonparametric resampling, that overcomes the common limitation of lack of preservation of historical correlation between daily flows across months. The downscaling framework presented here is parsimonious in parameters and model assumptions, does not generate negative values, and produces synthetic series that are statistically indistinguishable from the observations. We present evidence showing that both
Donahue, Megan; Mahdavi, Andisheh; Umetsu, Keiichi; Ettori, Stefano; Merten, Julian; Postman, Marc; Hoffer, Aaron; Baldi, Alessandro; Coe, Dan; Czakon, Nicole; Bartelmann, Mattias; Benitez, Narciso; Bouwens, Rychard; Bradley, Larry; Broadhurst, Tom; Ford, Holland; Gastaldello, Fabio; Grillo, Claudio; Infante, Leopoldo; Jouvel, Stephanie; Koekemoer, Anton; Kelson, Daniel; Lahav, Ofer; Lemze, Doron; Medezinski, Elinor; Melchior, Peter; Meneghetti, Massimo; Molino, Alberto; Moustakas, John; Moustakas, Leonidas A; Nonino, Mario; Rosati, Piero; Sayers, Jack; Seitz, Stella; Van der Wel, Arjen; Zheng, Wei; Zitrin, Adi
2014-01-01
We present profiles of temperature, gas mass, and hydrostatic mass estimated from X-ray observations of CLASH clusters. We compare measurements from XMM and Chandra and compare both sets to CLASH gravitational lensing mass profiles. We find that Chandra and XMM measurements of electron density and enclosed gas mass as functions of radius are nearly identical, indicating that any differences in hydrostatic masses inferred from X-ray observations arise from differences in gas-temperature estimates. Encouragingly, gas temperatures measured in clusters by XMM and Chandra are consistent with one another at ~100 kpc radii but XMM temperatures systematically decline relative to Chandra temperatures as the radius of the temperature measurement increases. One plausible reason for this trend is large-angle scattering of soft X-ray photons in excess of that amount expected from the standard XMM PSF correction. We present the CLASH-X mass-profile comparisons in the form of cosmology-independent and redshift-independent c...
Directory of Open Access Journals (Sweden)
Hasan A. H. Naji
2017-06-01
Full Text Available Taxi trajectories reflect human mobility over the urban roads’ network. Although taxi drivers cruise the same city streets, there is an observed variation in their daily profit. To reveal the reasons behind this issue, this study introduces a novel approach for investigating and understanding the impact of human mobility patterns (taxi drivers’ behavior on daily drivers’ profit. Firstly, a K-means clustering method is adopted to group taxi drivers into three profitability groups according to their driving duration, driving distance and income. Secondly, the cruising trips and stopping spots for each profitability group are extracted. Thirdly, a comparison among the profitability groups in terms of spatial and temporal patterns on cruising trips and stopping spots is carried out. The comparison applied various methods including the mash map matching method and DBSCAN clustering method. Finally, an overall analysis of the results is discussed in detail. The results show that there is a significant relationship between human mobility patterns and taxi drivers’ profitability. High profitability drivers based on their experience earn more compared to other driver groups, as they know which places are more active to cruise and to stop and at what times. This study provides suggestions and insights for taxi companies and taxi drivers in order to increase their daily income and to enhance the efficiency of the taxi industry.
Panel data nonparametric estimation of production risk and risk preferences
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We apply nonparametric panel data kernel regression to investigate production risk, out-put price uncertainty, and risk attitudes of Polish dairy farms based on a firm-level unbalanced panel data set that covers the period 2004–2010. We compare different model specifications and different...... approaches for obtaining firm-specific measures of risk attitudes. We found that Polish dairy farmers are risk averse regarding production risk and price uncertainty. According to our results, Polish dairy farmers perceive the production risk as being more significant than the risk related to output price...
Digital spectral analysis parametric, non-parametric and advanced methods
Castanié, Francis
2013-01-01
Digital Spectral Analysis provides a single source that offers complete coverage of the spectral analysis domain. This self-contained work includes details on advanced topics that are usually presented in scattered sources throughout the literature.The theoretical principles necessary for the understanding of spectral analysis are discussed in the first four chapters: fundamentals, digital signal processing, estimation in spectral analysis, and time-series models.An entire chapter is devoted to the non-parametric methods most widely used in industry.High resolution methods a
Nonparametric statistics a step-by-step approach
Corder, Gregory W
2014-01-01
"…a very useful resource for courses in nonparametric statistics in which the emphasis is on applications rather than on theory. It also deserves a place in libraries of all institutions where introductory statistics courses are taught."" -CHOICE This Second Edition presents a practical and understandable approach that enhances and expands the statistical toolset for readers. This book includes: New coverage of the sign test and the Kolmogorov-Smirnov two-sample test in an effort to offer a logical and natural progression to statistical powerSPSS® (Version 21) software and updated screen ca
Testing for a constant coefficient of variation in nonparametric regression
Dette, Holger; Marchlewski, Mareen; Wagener, Jens
2010-01-01
In the common nonparametric regression model Y_i=m(X_i)+sigma(X_i)epsilon_i we consider the problem of testing the hypothesis that the coefficient of the scale and location function is constant. The test is based on a comparison of the observations Y_i=\\hat{sigma}(X_i) with their mean by a smoothed empirical process, where \\hat{sigma} denotes the local linear estimate of the scale function. We show weak convergence of a centered version of this process to a Gaussian process under the null ...
Generative Temporal Modelling of Neuroimaging - Decomposition and Nonparametric Testing
DEFF Research Database (Denmark)
Hald, Ditte Høvenhoff
The goal of this thesis is to explore two improvements for functional magnetic resonance imaging (fMRI) analysis; namely our proposed decomposition method and an extension to the non-parametric testing framework. Analysis of fMRI allows researchers to investigate the functional processes...... of the brain, and provides insight into neuronal coupling during mental processes or tasks. The decomposition method is a Gaussian process-based independent components analysis (GPICA), which incorporates a temporal dependency in the sources. A hierarchical model specification is used, featuring both...
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
A Modified Nonparametric Message Passing Algorithm for Soft Iterative Channel Estimation
Directory of Open Access Journals (Sweden)
Linlin Duan
2013-08-01
Full Text Available Based on the factor graph framework, we derived a Modified Nonparametric Message Passing Algorithm (MNMPA for soft iterative channel estimation in a Low Density Parity-Check (LDPC coded Bit-Interleaved Coded Modulation (BICM system. The algorithm combines ideas from Particle Filtering (PF with popular factor graph techniques. A Markov Chain Monte Carlo (MCMC move step is added after typical sequential Important Sampling (SIS -resampling to prevent particle impoverishment and to improve channel estimation precision. To reduce complexity, a new max-sum rule for updating particle based messages is reformulated and two proper update schedules are designed. Simulation results illustrate the effectiveness of MNMPA and its comparison with other sum-product algorithms in a Gaussian or non-Gaussian noise environment. We also studied the effect of the particle number, pilot symbol spacing and different schedules on BER performance.
Energy Technology Data Exchange (ETDEWEB)
Donahue, Megan; Voit, G. Mark; Hoffer, Aaron; Baldi, Alessandro [Physics and Astronomy Department, Michigan State University, East Lansing, MI 48824 (United States); Mahdavi, Andisheh [San Francisco State University, San Francisco, CA 94132 (United States); Umetsu, Keiichi; Czakon, Nicole [Institute of Astronomy and Astrophysics, Academia Sinica, Roosevelt Road, Taipei 10617, Taiwan (China); Ettori, Stefano [INFN, Sezione di Bologna, viale Berti Pichat 6/2, I-40127 Bologna (Italy); Merten, Julian [Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109 (United States); Postman, Marc; Coe, Dan; Bradley, Larry [STScI, 3700 San Martin Drive, Baltimore, MD 21218 (United States); Bartelmann, Mattias [Universität Heidelberg, Zentrum für Astronomie, Philosophenweg 12, D-69120 Heidelberg (Germany); Benitez, Narciso [Instituto de Astrofisica de Andalucia (CSIC), C/Camino Bajo de Huétor 24, Granada E-18008 (Spain); Bouwens, Rychard [Leiden Observatories, Niels Bohrweb 2, NL-2333 CA Leiden (Netherlands); Broadhurst, Tom [Department of Theoretical Physics, University of the Basque Country, E-48080 Bilbao (Spain); Ford, Holland [Department of Physics and Astronomy, The Johns Hopkins University, Baltimore, MD 21218 (United States); Gastaldello, Fabio [INAF-IASF, via Bassini 15, I-20133 Milan (Italy); Grillo, Claudio [Dark Cosmology Centre, Niels Bohr Institute, University of Copenhagen, Juliane Maries Vej 30, DK-2100 Copenhagen (Denmark); Infante, Leopoldo, E-mail: donahue@pa.msu.edu [Dept Astronomía-Astrofísica, Pontificia Universidad Católica de Chile, Casilla 306, 22 Santiago (Chile); and others
2014-10-20
We present profiles of temperature, gas mass, and hydrostatic mass estimated from new and archival X-ray observations of CLASH clusters. We compare measurements derived from XMM and Chandra observations with one another and compare both to gravitational lensing mass profiles derived with CLASH Hubble Space Telescope and Subaru Telescope lensing data. Radial profiles of Chandra and XMM measurements of electron density and enclosed gas mass are nearly identical, indicating that differences in hydrostatic masses inferred from X-ray observations arise from differences in gas-temperature measurements. Encouragingly, gas temperatures measured in clusters by XMM and Chandra are consistent with one another at ∼100-200 kpc radii, but XMM temperatures systematically decline relative to Chandra temperatures at larger radii. The angular dependence of the discrepancy suggests that additional investigation on systematics such as the XMM point-spread function correction, vignetting, and off-axis responses is yet required. We present the CLASH-X mass-profile comparisons in the form of cosmology-independent and redshift-independent circular-velocity profiles. We argue that comparisons of circular-velocity profiles are the most robust way to assess mass bias. Ratios of Chandra hydrostatic equilibrium (HSE) mass profiles to CLASH lensing profiles show no obvious radial dependence in the 0.3-0.8 Mpc range. However, the mean mass biases inferred from the weak-lensing (WL) and SaWLens data are different. As an example, the weighted-mean value at 0.5 Mpc is (b) = 0.12 for the WL comparison and (b) = –0.11 for the SaWLens comparison. The ratios of XMM HSE mass profiles to CLASH lensing profiles show a pronounced radial dependence in the 0.3-1.0 Mpc range, with a weighted mean mass bias value rising to (b) ≳ 0.3 at ∼1 Mpc for the WL comparison and (b) ≈ 0.25 for the SaWLens comparison. The enclosed gas mass profiles from both Chandra and XMM rise to a value ≈1/8 times the total
Using Mathematica to build Non-parametric Statistical Tables
Directory of Open Access Journals (Sweden)
Gloria Perez Sainz de Rozas
2003-01-01
Full Text Available In this paper, I present computational procedures to obtian statistical tables. The tables of the asymptotic distribution and the exact distribution of Kolmogorov-Smirnov statistic Dn for one population, the table of the distribution of the runs R, the table of the distribution of Wilcoxon signed-rank statistic W+ and the table of the distribution of Mann-Whitney statistic Ux using Mathematica, Version 3.9 under Window98. I think that it is an interesting cuestion because many statistical packages give the asymptotic significance level in the statistical tests and with these porcedures one can easily calculate the exact significance levels and the left-tail and right-tail probabilities with non-parametric distributions. I have used mathematica to make these calculations because one can use symbolic language to solve recursion relations. It's very easy to generate the format of the tables, and it's possible to obtain any table of the mentioned non-parametric distributions with any precision, not only with the standard parameters more used in Statistics, and without transcription mistakes. Furthermore, using similar procedures, we can generate tables for the following distribution functions: Binomial, Poisson, Hypergeometric, Normal, x2 Chi-Square, T-Student, F-Snedecor, Geometric, Gamma and Beta.
1st Conference of the International Society for Nonparametric Statistics
Lahiri, S; Politis, Dimitris
2014-01-01
This volume is composed of peer-reviewed papers that have developed from the First Conference of the International Society for NonParametric Statistics (ISNPS). This inaugural conference took place in Chalkidiki, Greece, June 15-19, 2012. It was organized with the co-sponsorship of the IMS, the ISI, and other organizations. M.G. Akritas, S.N. Lahiri, and D.N. Politis are the first executive committee members of ISNPS, and the editors of this volume. ISNPS has a distinguished Advisory Committee that includes Professors R.Beran, P.Bickel, R. Carroll, D. Cook, P. Hall, R. Johnson, B. Lindsay, E. Parzen, P. Robinson, M. Rosenblatt, G. Roussas, T. SubbaRao, and G. Wahba. The Charting Committee of ISNPS consists of more than 50 prominent researchers from all over the world. The chapters in this volume bring forth recent advances and trends in several areas of nonparametric statistics. In this way, the volume facilitates the exchange of research ideas, promotes collaboration among researchers from all over the wo...
Non-parametric Morphologies of Mergers in the Illustris Simulation
Bignone, Lucas A; Sillero, Emanuel; Pedrosa, Susana E; Pellizza, Leonardo J; Lambas, Diego G
2016-01-01
We study non-parametric morphologies of mergers events in a cosmological context, using the Illustris project. We produce mock g-band images comparable to observational surveys from the publicly available Illustris simulation idealized mock images at $z=0$. We then measure non parametric indicators: asymmetry, Gini, $M_{20}$, clumpiness and concentration for a set of galaxies with $M_* >10^{10}$ M$_\\odot$. We correlate these automatic statistics with the recent merger history of galaxies and with the presence of close companions. Our main contribution is to assess in a cosmological framework, the empirically derived non-parametric demarcation line and average time-scales used to determine the merger rate observationally. We found that 98 per cent of galaxies above the demarcation line have a close companion or have experienced a recent merger event. On average, merger signatures obtained from the $G-M_{20}$ criteria anticorrelate clearly with the elapsing time to the last merger event. We also find that the a...
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
Lee, Kyungbook; Song, Seok Goo
2016-10-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events (M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
A non-parametric framework for estimating threshold limit values
Directory of Open Access Journals (Sweden)
Ulm Kurt
2005-11-01
Full Text Available Abstract Background To estimate a threshold limit value for a compound known to have harmful health effects, an 'elbow' threshold model is usually applied. We are interested on non-parametric flexible alternatives. Methods We describe how a step function model fitted by isotonic regression can be used to estimate threshold limit values. This method returns a set of candidate locations, and we discuss two algorithms to select the threshold among them: the reduced isotonic regression and an algorithm considering the closed family of hypotheses. We assess the performance of these two alternative approaches under different scenarios in a simulation study. We illustrate the framework by analysing the data from a study conducted by the German Research Foundation aiming to set a threshold limit value in the exposure to total dust at workplace, as a causal agent for developing chronic bronchitis. Results In the paper we demonstrate the use and the properties of the proposed methodology along with the results from an application. The method appears to detect the threshold with satisfactory success. However, its performance can be compromised by the low power to reject the constant risk assumption when the true dose-response relationship is weak. Conclusion The estimation of thresholds based on isotonic framework is conceptually simple and sufficiently powerful. Given that in threshold value estimation context there is not a gold standard method, the proposed model provides a useful non-parametric alternative to the standard approaches and can corroborate or challenge their findings.
Bayesian nonparametric centered random effects models with variable selection.
Yang, Mingan
2013-03-01
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Wavelet Estimators in Nonparametric Regression: A Comparative Simulation Study
Directory of Open Access Journals (Sweden)
Anestis Antoniadis
2001-06-01
Full Text Available Wavelet analysis has been found to be a powerful tool for the nonparametric estimation of spatially-variable objects. We discuss in detail wavelet methods in nonparametric regression, where the data are modelled as observations of a signal contaminated with additive Gaussian noise, and provide an extensive review of the vast literature of wavelet shrinkage and wavelet thresholding estimators developed to denoise such data. These estimators arise from a wide range of classical and empirical Bayes methods treating either individual or blocks of wavelet coefficients. We compare various estimators in an extensive simulation study on a variety of sample sizes, test functions, signal-to-noise ratios and wavelet filters. Because there is no single criterion that can adequately summarise the behaviour of an estimator, we use various criteria to measure performance in finite sample situations. Insight into the performance of these estimators is obtained from graphical outputs and numerical tables. In order to provide some hints of how these estimators should be used to analyse real data sets, a detailed practical step-by-step illustration of a wavelet denoising analysis on electrical consumption is provided. Matlab codes are provided so that all figures and tables in this paper can be reproduced.
Computing Economies of Scope Using Robust Partial Frontier Nonparametric Methods
Directory of Open Access Journals (Sweden)
Pedro Carvalho
2016-03-01
Full Text Available This paper proposes a methodology to examine economies of scope using the recent order-α nonparametric method. It allows us to investigate economies of scope by comparing the efficient order-α frontiers of firms that produce two or more goods with the efficient order-α frontiers of firms that produce only one good. To accomplish this, and because the order-α frontiers are irregular, we suggest to linearize them by the DEA estimator. The proposed methodology uses partial frontier nonparametric methods that are more robust than the traditional full frontier methods. By using a sample of 67 Portuguese water utilities for the period 2002–2008 and, also, a simulated sample, we prove the usefulness of the approach adopted and show that if only the full frontier methods were used, they would lead to different results. We found evidence of economies of scope in the provision of water supply and wastewater services simultaneously by water utilities in Portugal.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
DEFF Research Database (Denmark)
Effraimidis, Georgios; Dahl, Christian Møller
In this paper, we develop a fully nonparametric approach for the estimation of the cumulative incidence function with Missing At Random right-censored competing risks data. We obtain results on the pointwise asymptotic normality as well as the uniform convergence rate of the proposed nonparametric...... estimator. A simulation study that serves two purposes is provided. First, it illustrates in details how to implement our proposed nonparametric estimator. Secondly, it facilitates a comparison of the nonparametric estimator to a parametric counterpart based on the estimator of Lu and Liang (2008...
Portable and Simple Technique Using Multi Core and Multiple GbE Ports for Commodity PC Clusters
Fukunaga, Takafumi
Due to advent of powerful and easily available Multi core PC clusters, the computing power per node has been increasing significantly. On the other hand, installation and maintenance costs of powerful interconnection networks (Myrinet, Infiniband, etc.) are still expensive. Moreover, because they use nonstandard protocols and special device drivers, they tend to increase the specializations and complexities both in programming and in operability, and degrade portability. This paper proposes the portable method for improving the performance of bandwidth-oriented parallel applications by increasing the bandwidth without dedicated hardware, drivers, protocols, libraries and IEEE802.3ad (LACP). Since proposed method is introduced only by loading the proposed driver without any modifications to the TCP/IP protocol stacks and to existing applications, it has advantages in both high portability and stability. Proposed method also performs better than LACP, which is the most similar in comparison to proposal, without LACP supported switches and drivers. In addition, LACP performance is influenced both by the distribution algorithms implemented both in switches and in NIC drivers, and by the network parameters such as MAC addresses, IP addresses, VLAN id, etc. used in distribution algorithms. On the other hand, proposed method shows a stable effect regardless of them.
Directory of Open Access Journals (Sweden)
Kosiorowska Ewa
2015-06-01
Full Text Available We briefly communicate the results of nonparametric and robust evaluation of the effects of the Fourth Millennium Development Goal of the United Nations. The main aim of the goal was reducing by two thirds, from 1990-2015, under five month’s child mortality. Our novel analysis was conducted by means of very powerful and user friendly tools offered by the Data Depth Concept being a collection of multivariate techniques basing on multivariate generalizations of quintiles, ranges and order statistics. The results of our analysis are more convincing than the results obtained using classical statistical tools.
Carroll, Raymond J.
2011-03-01
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
Document Clustering Based on Semi-Supervised Term Clustering
Directory of Open Access Journals (Sweden)
Hamid Mahmoodi
2012-05-01
Full Text Available The study is conducted to propose a multi-step feature (term selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to first group the terms using FCM algorithm and then cluster documents based on terms clusters. We evaluate effectiveness of our technique on several standard text collections and compare our results with the some classical text clustering algorithms.
DEFF Research Database (Denmark)
Löwe, Roland; Madsen, Henrik; McSharry, Patrick
2016-01-01
This study evaluated methods for automated classification of rain events into groups of "high" and "low" spatial and temporal variability in offline and online situations. The applied classification techniques are fast and based on rainfall data only, and can thus be applied by, e.g., water system...... and quadratic discriminant analysis both provided a fast and reliable identification of rain events of "high" variability, while the k-means provided the smallest number of rain events falsely identified as being of "high" variability (false hits). A simple classification method based on a threshold...
Energy Technology Data Exchange (ETDEWEB)
Abdulbaqi, Hayder Saad [School of Physics, Universiti Sains Malaysia, 11700, Penang (Malaysia); Department of Physics, College of Education, University of Al-Qadisiya, Al-Qadisiya (Iraq); Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin [School of Physics, Universiti Sains Malaysia, 11700, Penang (Malaysia); Abood, Loay Kadom [Department of Computer Science, College of Science, University of Baghdad, Baghdad (Iraq)
2015-04-24
Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P
2016-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal s...
A Non-Parametric Spatial Independence Test Using Symbolic Entropy
Directory of Open Access Journals (Sweden)
López Hernández, Fernando
2008-01-01
Full Text Available In the present paper, we construct a new, simple, consistent and powerful test forspatial independence, called the SG test, by using symbolic dynamics and symbolic entropyas a measure of spatial dependence. We also give a standard asymptotic distribution of anaffine transformation of the symbolic entropy under the null hypothesis of independencein the spatial process. The test statistic and its standard limit distribution, with theproposed symbolization, are invariant to any monotonuous transformation of the data.The test applies to discrete or continuous distributions. Given that the test is based onentropy measures, it avoids smoothed nonparametric estimation. We include a MonteCarlo study of our test, together with the well-known Moran’s I, the SBDS (de Graaffet al, 2001 and (Brett and Pinkse, 1997 non parametric test, in order to illustrate ourapproach.
Analyzing single-molecule time series via nonparametric Bayesian inference.
Hines, Keegan E; Bankston, John R; Aldrich, Richard W
2015-02-03
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Analyzing multiple spike trains with nonparametric Granger causality.
Nedungadi, Aatira G; Rangarajan, Govindan; Jain, Neeraj; Ding, Mingzhou
2009-08-01
Simultaneous recordings of spike trains from multiple single neurons are becoming commonplace. Understanding the interaction patterns among these spike trains remains a key research area. A question of interest is the evaluation of information flow between neurons through the analysis of whether one spike train exerts causal influence on another. For continuous-valued time series data, Granger causality has proven an effective method for this purpose. However, the basis for Granger causality estimation is autoregressive data modeling, which is not directly applicable to spike trains. Various filtering options distort the properties of spike trains as point processes. Here we propose a new nonparametric approach to estimate Granger causality directly from the Fourier transforms of spike train data. We validate the method on synthetic spike trains generated by model networks of neurons with known connectivity patterns and then apply it to neurons simultaneously recorded from the thalamus and the primary somatosensory cortex of a squirrel monkey undergoing tactile stimulation.
Prior processes and their applications nonparametric Bayesian estimation
Phadia, Eswar G
2016-01-01
This book presents a systematic and comprehensive treatment of various prior processes that have been developed over the past four decades for dealing with Bayesian approach to solving selected nonparametric inference problems. This revised edition has been substantially expanded to reflect the current interest in this area. After an overview of different prior processes, it examines the now pre-eminent Dirichlet process and its variants including hierarchical processes, then addresses new processes such as dependent Dirichlet, local Dirichlet, time-varying and spatial processes, all of which exploit the countable mixture representation of the Dirichlet process. It subsequently discusses various neutral to right type processes, including gamma and extended gamma, beta and beta-Stacy processes, and then describes the Chinese Restaurant, Indian Buffet and infinite gamma-Poisson processes, which prove to be very useful in areas such as machine learning, information retrieval and featural modeling. Tailfree and P...
Nonparametric Estimation of Distributions in Random Effects Models
Hart, Jeffrey D.
2011-01-01
We propose using minimum distance to obtain nonparametric estimates of the distributions of components in random effects models. A main setting considered is equivalent to having a large number of small datasets whose locations, and perhaps scales, vary randomly, but which otherwise have a common distribution. Interest focuses on estimating the distribution that is common to all datasets, knowledge of which is crucial in multiple testing problems where a location/scale invariant test is applied to every small dataset. A detailed algorithm for computing minimum distance estimates is proposed, and the usefulness of our methodology is illustrated by a simulation study and an analysis of microarray data. Supplemental materials for the article, including R-code and a dataset, are available online. © 2011 American Statistical Association.
Curve registration by nonparametric goodness-of-fit testing
Dalalyan, Arnak
2011-01-01
The problem of curve registration appears in many different areas of applications ranging from neuroscience to road traffic modeling. In the present work, we propose a nonparametric testing framework in which we develop a generalized likelihood ratio test to perform curve registration. We first prove that, under the null hypothesis, the resulting test statistic is asymptotically distributed as a chi-squared random variable. This result, often referred to as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic significance level and a natural measure of lack-of-fit in terms of the p-value of the chi squared test. We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 1. Some numerical experiments on synthetic datasets are reported as well.
Nonparametric forecasting of low-dimensional dynamical systems.
Berry, Tyrus; Giannakis, Dimitrios; Harlim, John
2015-03-01
This paper presents a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. The key idea is to represent the discrete shift maps on a smooth basis which can be obtained by the diffusion maps algorithm. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution to the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to quantify uncertainties (in fact, evolve the probability distribution) for nontrivial dynamical systems with equation-free modeling. We verify our approach on various examples, ranging from an inhomogeneous anisotropic stochastic differential equation on a torus, the chaotic Lorenz three-dimensional model, and the Niño-3.4 data set which is used as a proxy of the El Niño Southern Oscillation.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes
García, Constantino A.; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G.
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Indoor Positioning Using Nonparametric Belief Propagation Based on Spanning Trees
Directory of Open Access Journals (Sweden)
Savic Vladimir
2010-01-01
Full Text Available Nonparametric belief propagation (NBP is one of the best-known methods for cooperative localization in sensor networks. It is capable of providing information about location estimation with appropriate uncertainty and to accommodate non-Gaussian distance measurement errors. However, the accuracy of NBP is questionable in loopy networks. Therefore, in this paper, we propose a novel approach, NBP based on spanning trees (NBP-ST created by breadth first search (BFS method. In addition, we propose a reliable indoor model based on obtained measurements in our lab. According to our simulation results, NBP-ST performs better than NBP in terms of accuracy and communication cost in the networks with high connectivity (i.e., highly loopy networks. Furthermore, the computational and communication costs are nearly constant with respect to the transmission radius. However, the drawbacks of proposed method are a little bit higher computational cost and poor performance in low-connected networks.
Binary Classifier Calibration Using a Bayesian Non-Parametric Approach.
Naeini, Mahdi Pakdaman; Cooper, Gregory F; Hauskrecht, Milos
Learning probabilistic predictive models that are well calibrated is critical for many prediction and decision-making tasks in Data mining. This paper presents two new non-parametric methods for calibrating outputs of binary classification models: a method based on the Bayes optimal selection and a method based on the Bayesian model averaging. The advantage of these methods is that they are independent of the algorithm used to learn a predictive model, and they can be applied in a post-processing step, after the model is learned. This makes them applicable to a wide variety of machine learning models and methods. These calibration methods, as well as other methods, are tested on a variety of datasets in terms of both discrimination and calibration performance. The results show the methods either outperform or are comparable in performance to the state-of-the-art calibration methods.
Parametric or nonparametric? A parametricness index for model selection
Liu, Wei; 10.1214/11-AOS899
2012-01-01
In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike's information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is si...
Nonparametric reconstruction of the Om diagnostic to test LCDM
Escamilla-Rivera, Celia
2015-01-01
Cosmic acceleration is usually related with the unknown dark energy, which equation of state, w(z), is constrained and numerically confronted with independent astrophysical data. In order to make a diagnostic of w(z), the introduction of a null test of dark energy can be done using a diagnostic function of redshift, Om. In this work we present a nonparametric reconstruction of this diagnostic using the so-called Loess-Simex factory to test the concordance model with the advantage that this approach offers an alternative way to relax the use of priors and find a possible 'w' that reliably describe the data with no previous knowledge of a cosmological model. Our results demonstrate that the method applied to the dynamical Om diagnostic finds a preference for a dark energy model with equation of state w =-2/3, which correspond to a static domain wall network.
Evaluation of Nonparametric Probabilistic Forecasts of Wind Power
DEFF Research Database (Denmark)
Pinson, Pierre; Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg, orlov 31.07.2008;
likely outcome for each look-ahead time, but also with uncertainty estimates given by probabilistic forecasts. In order to avoid assumptions on the shape of predictive distributions, these probabilistic predictions are produced from nonparametric methods, and then take the form of a single or a set...... of quantile forecasts. The required and desirable properties of such probabilistic forecasts are defined and a framework for their evaluation is proposed. This framework is applied for evaluating the quality of two statistical methods producing full predictive distributions from point predictions of wind......Predictions of wind power production for horizons up to 48-72 hour ahead comprise a highly valuable input to the methods for the daily management or trading of wind generation. Today, users of wind power predictions are not only provided with point predictions, which are estimates of the most...
Equity and efficiency in private and public education: a nonparametric comparison
L. Cherchye; K. de Witte; E. Ooghe; I. Nicaise
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Song, Dong; Wang, Zhuo; Marmarelis, Vasilis Z; Berger, Theodore W
2009-02-01
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson-Volterra models are obtained herein from broadband experimental input-output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input-output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Non-parametric tests of productive efficiency with errors-in-variables
Kuosmanen, T.K.; Post, T.; Scholtes, S.
2007-01-01
We develop a non-parametric test of productive efficiency that accounts for errors-in-variables, following the approach of Varian. [1985. Nonparametric analysis of optimizing behavior with measurement error. Journal of Econometrics 30(1/2), 445-458]. The test is based on the general Pareto-Koopmans
Equity and efficiency in private and public education: a nonparametric comparison
Cherchye, L.; de Witte, K.; Ooghe, E.; Nicaise, I.
2007-01-01
We present a nonparametric approach for the equity and efficiency evaluation of (private and public) primary schools in Flanders. First, we use a nonparametric (Data Envelopment Analysis) model that is specially tailored to assess educational efficiency at the pupil level. The model accounts for the
Koestler, Devin C; Christensen, Brock C; Marsit, Carmen J; Kelsey, Karl T; Houseman, E Andres
2013-03-05
DNA methylation is a well-recognized epigenetic mechanism that has been the subject of a growing body of literature typically focused on the identification and study of profiles of DNA methylation and their association with human diseases and exposures. In recent years, a number of unsupervised clustering algorithms, both parametric and non-parametric, have been proposed for clustering large-scale DNA methylation data. However, most of these approaches do not incorporate known biological relationships of measured features, and in some cases, rely on unrealistic assumptions regarding the nature of DNA methylation. Here, we propose a modified version of a recursively partitioned mixture model (RPMM) that integrates information related to the proximity of CpG loci within the genome to inform correlation structures from which subsequent clustering analysis is based. Using simulations and four methylation data sets, we demonstrate that integrating biologically informative correlation structures within RPMM resulted in improved goodness-of-fit, clustering consistency, and the ability to detect biologically meaningful clusters compared to methods which ignore such correlation. Integrating biologically-informed correlation structures to enhance modeling techniques is motivated by the rapid increase in resolution of DNA methylation microarrays and the increasing understanding of the biology of this epigenetic mechanism.
Joint deprojection of Sunyaev-Zeldovich and X-ray images of galaxy clusters
Ameglio, S; Pierpaoli, E; Dolag, K
2007-01-01
We present two non-parametric deprojection methods aimed at recovering the three-dimensional density and temperature profiles of galaxy clusters from spatially resolved thermal Sunyaev-Zeldovich (tSZ) and X-ray surface brightness maps, thus avoiding the use of X-ray spectroscopic data. In both methods, clusters are assumed to be spherically symmetric and modeled with an onion-skin structure. The first method follows a direct geometrical approach. The second method is based on the maximization of a single joint (tSZ and X-ray) likelihood function, which allows one to fit simultaneously the two signals by following a Monte Carlo Markov Chain approach. These techniques are tested against a set of cosmological simulations of clusters, with and without instrumental noise. We project each cluster along the three orthogonal directions defined by the principal axes of the momentum of inertia tensor. This enables us to check any bias in the deprojection associated to the cluster elongation along the line of sight. Aft...
Wesselink, Christiaan; Heeg, Govert P.; Jansonius, Nomdo M.
Objective: To compare prospectively 2 perimetric progression detection algorithms for glaucoma, the Early Manifest Glaucoma Trial algorithm (glaucoma progression analysis [GPA]) and a nonparametric algorithm applied to the mean deviation (MD) (nonparametric progression analysis [NPA]). Methods:
Structuring feature space: a non-parametric method for volumetric transfer function generation.
Maciejewski, Ross; Woo, Insoo; Chen, Wei; Ebert, David S
2009-01-01
The use of multi-dimensional transfer functions for direct volume rendering has been shown to be an effective means of extracting materials and their boundaries for both scalar and multivariate data. The most common multi-dimensional transfer function consists of a two-dimensional (2D) histogram with axes representing a subset of the feature space (e.g., value vs. value gradient magnitude), with each entry in the 2D histogram being the number of voxels at a given feature space pair. Users then assign color and opacity to the voxel distributions within the given feature space through the use of interactive widgets (e.g., box, circular, triangular selection). Unfortunately, such tools lead users through a trial-and-error approach as they assess which data values within the feature space map to a given area of interest within the volumetric space. In this work, we propose the addition of non-parametric clustering within the transfer function feature space in order to extract patterns and guide transfer function generation. We apply a non-parametric kernel density estimation to group voxels of similar features within the 2D histogram. These groups are then binned and colored based on their estimated density, and the user may interactively grow and shrink the binned regions to explore feature boundaries and extract regions of interest. We also extend this scheme to temporal volumetric data in which time steps of 2D histograms are composited into a histogram volume. A three-dimensional (3D) density estimation is then applied, and users can explore regions within the feature space across time without adjusting the transfer function at each time step. Our work enables users to effectively explore the structures found within a feature space of the volume and provide a context in which the user can understand how these structures relate to their volumetric data. We provide tools for enhanced exploration and manipulation of the transfer function, and we show that the initial
Segmentation of Nonstationary Time Series with Geometric Clustering
DEFF Research Database (Denmark)
Bocharov, Alexei; Thiesson, Bo
2013-01-01
We introduce a non-parametric method for segmentation in regimeswitching time-series models. The approach is based on spectral clustering of target-regressor tuples and derives a switching regression tree, where regime switches are modeled by oblique splits. Such models can be learned efficiently...
Directory of Open Access Journals (Sweden)
Ismet DOGAN
2015-10-01
Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.
Nonparametric predictive inference for combining diagnostic tests with parametric copula
Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.
2017-09-01
Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Nonparametric identification of structural modifications in Laplace domain
Suwała, G.; Jankowski, Ł.
2017-02-01
This paper proposes and experimentally verifies a Laplace-domain method for identification of structural modifications, which (1) unlike time-domain formulations, allows the identification to be focused on these parts of the frequency spectrum that have a high signal-to-noise ratio, and (2) unlike frequency-domain formulations, decreases the influence of numerical artifacts related to the particular choice of the FFT exponential window decay. In comparison to the time-domain approach proposed earlier, advantages of the proposed method are smaller computational cost and higher accuracy, which leads to reliable performance in more difficult identification cases. Analytical formulas for the first- and second-order sensitivity analysis are derived. The approach is based on a reduced nonparametric model, which has the form of a set of selected structural impulse responses. Such a model can be collected purely experimentally, which obviates the need for design and laborious updating of a parametric model, such as a finite element model. The approach is verified experimentally using a 26-node lab 3D truss structure and 30 identification cases of a single mass modification or two concurrent mass modifications.
A New Non-Parametric Approach to Galaxy Morphological Classification
Lotz, J M; Madau, P; Lotz, Jennifer M.; Primack, Joel; Madau, Piero
2003-01-01
We present two new non-parametric methods for quantifying galaxy morphology: the relative distribution of the galaxy pixel flux values (the Gini coefficient or G) and the second-order moment of the brightest 20% of the galaxy's flux (M20). We test the robustness of G and M20 to decreasing signal-to-noise and spatial resolution, and find that both measures are reliable to within 10% at average signal-to-noise per pixel greater than 3 and resolutions better than 1000 pc and 500 pc, respectively. We have measured G and M20, as well as concentration (C), asymmetry (A), and clumpiness (S) in the rest-frame near-ultraviolet/optical wavelengths for 150 bright local "normal" Hubble type galaxies (E-Sd) galaxies and 104 0.05 < z < 0.25 ultra-luminous infrared galaxies (ULIRGs).We find that most local galaxies follow a tight sequence in G-M20-C, where early-types have high G and C and low M20 and late-type spirals have lower G and C and higher M20. The majority of ULIRGs lie above the normal galaxy G-M20 sequence...
Nonparametric Bayes modeling for case control studies with many predictors.
Zhou, Jing; Herring, Amy H; Bhattacharya, Anirban; Olshan, Andrew F; Dunson, David B
2016-03-01
It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.
Adaptive Neural Network Nonparametric Identifier With Normalized Learning Laws.
Chairez, Isaac
2016-04-05
This paper addresses the design of a normalized convergent learning law for neural networks (NNs) with continuous dynamics. The NN is used here to obtain a nonparametric model for uncertain systems described by a set of ordinary differential equations. The source of uncertainties is the presence of some external perturbations and poor knowledge of the nonlinear function describing the system dynamics. A new adaptive algorithm based on normalized algorithms was used to adjust the weights of the NN. The adaptive algorithm was derived by means of a nonstandard logarithmic Lyapunov function (LLF). Two identifiers were designed using two variations of LLFs leading to a normalized learning law for the first identifier and a variable gain normalized learning law. In the case of the second identifier, the inclusion of normalized learning laws yields to reduce the size of the convergence region obtained as solution of the practical stability analysis. On the other hand, the velocity of convergence for the learning laws depends on the norm of errors in inverse form. This fact avoids the peaking transient behavior in the time evolution of weights that accelerates the convergence of identification error. A numerical example demonstrates the improvements achieved by the algorithm introduced in this paper compared with classical schemes with no-normalized continuous learning methods. A comparison of the identification performance achieved by the no-normalized identifier and the ones developed in this paper shows the benefits of the learning law proposed in this paper.
Nonparametric estimation of quantum states, processes and measurements
Lougovski, Pavel; Bennink, Ryan
Quantum state, process, and measurement estimation methods traditionally use parametric models, in which the number and role of relevant parameters is assumed to be known. When such an assumption cannot be justified, a common approach in many disciplines is to fit the experimental data to multiple models with different sets of parameters and utilize an information criterion to select the best fitting model. However, it is not always possible to assume a model with a finite (countable) number of parameters. This typically happens when there are unobserved variables that stem from hidden correlations that can only be unveiled after collecting experimental data. How does one perform quantum characterization in this situation? We present a novel nonparametric method of experimental quantum system characterization based on the Dirichlet Process (DP) that addresses this problem. Using DP as a prior in conjunction with Bayesian estimation methods allows us to increase model complexity (number of parameters) adaptively as the number of experimental observations grows. We illustrate our approach for the one-qubit case and show how a probability density function for an unknown quantum process can be estimated.
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Pivotal Estimation of Nonparametric Functions via Square-root Lasso
Belloni, Alexandre; Wang, Lie
2011-01-01
In a nonparametric linear regression model we study a variant of LASSO, called square-root LASSO, which does not require the knowledge of the scaling parameter $\\sigma$ of the noise or bounds for it. This work derives new finite sample upper bounds for prediction norm rate of convergence, $\\ell_1$-rate of converge, $\\ell_\\infty$-rate of convergence, and sparsity of the square-root LASSO estimator. A lower bound for the prediction norm rate of convergence is also established. In many non-Gaussian noise cases, we rely on moderate deviation theory for self-normalized sums and on new data-dependent empirical process inequalities to achieve Gaussian-like results provided log p = o(n^{1/3}) improving upon results derived in the parametric case that required log p = O(log n). In addition, we derive finite sample bounds on the performance of ordinary least square (OLS) applied tom the model selected by square-root LASSO accounting for possible misspecification of the selected model. In particular, we provide mild con...
Carbonel, Domingo; Rodríguez-Tribaldos, Verónica; Gutiérrez, Francisco; Galve, Jorge Pedro; Guerrero, Jesús; Zarroca, Mario; Roqué, Carles; Linares, Rogelio; McCalpin, James P.; Acosta, Enrique
2015-01-01
This contribution analyses a complex sinkhole cluster buried by urban elements in the mantled evaporite karst of Zaragoza city, NE Spain, where active subsidence has caused significant economic losses (~ 0.3 million Euro). The investigation, conducted after the development of the area, has involved the application of multiple surface and subsurface techniques. A detailed map of modern surface deformation indicates two active coalescing sinkholes, whereas the interpretation of old aerial photographs reveals the presence of two additional dormant sinkholes beneath human structures that might reactivate in the near future. DInSAR (Differential Interferometry Synthetic Aperture Radar) displacement data have limited spatial coverage mainly due to high subsidence rates and surface changes (re-pavement), and the Electrical Resistivity Tomography (ERT) and trenching investigations were severely restricted by the presence of urban elements. Nonetheless, the three techniques consistently indicate that the area affected by subsidence is larger than that defined by surface deformation features. The performance of the Ground Penetrating Radar (GPR) technique was adversely affected by the presence of highly conductive and massive anthropogenic deposits, but some profiles reveal that subsidence in the central sector of one of the sinkholes is mainly accommodated by sagging. The stratigraphic and structural relationships observed in a trench dug across the topographic margin of one of the sinkholes may be alternatively interpreted by three collapse events of around 0.6 m that occurred after 290 yr BP, or by progressive fault displacement combined with episodic anthropogenic excavation and fill. Average subsidence rates of > 6.6 mm/yr and 40 mm/yr have been calculated using stratigraphic markers dated by the radiocarbon method and historical information, respectively. This case study illustrates the need of conducting thorough investigations in sinkhole areas during the pre
Directory of Open Access Journals (Sweden)
Takashi Fukaya
2012-01-01
Full Text Available Objective. There are no reports comparing the protocols provided by rigid marker set (RMS and point cluster technique (PCT, which are similar in terms of estimating anatomical landmarks based on markers attached to a segment. The purpose of this study was to clarify the correlation of the two different protocols, which are protocols for knee motion in gait, and identify whether measurement errors arose at particular periods during the stance phase. Methods. The study subjects were 10 healthy adults. All estimated anatomical landmarks were which their positions, calculated by each protocol of the PCT and RMS, were compared using Pearson’s product correlation coefficients. To examine the reliability of the angle changes of the knee joint measured by RMS and the PCT, the coefficient of multiple correlations (CMCs was used. Results. Although the estimates of the anatomical landmarks showed high correlations of >0.90 (<0.01 for the Y- and Z-coordinates, the correlations were low for the X-coordinates at all anatomical landmarks. The CMC was 0.94 for flexion/extension, 0.74 for abduction/adduction, and 0.71 for external/internal rotation. Conclusion. Flexion/extension and abduction/adduction of the knee by two different protocols had comparatively little error and good reliability after 30% of the stance phase.
Institute of Scientific and Technical Information of China (English)
LINGNeng-xiang; DUXue-qiao
2005-01-01
In this paper, we study the strong consistency for partitioning estimation of regression function under samples that axe φ-mixing sequences with identically distribution.Key words: nonparametric regression function; partitioning estimation; strong convergence;φ-mixing sequences.
Kernel bandwidth estimation for non-parametric density estimation: a comparative study
CSIR Research Space (South Africa)
Van der Walt, CM
2013-12-01
Full Text Available We investigate the performance of conventional bandwidth estimators for non-parametric kernel density estimation on a number of representative pattern-recognition tasks, to gain a better understanding of the behaviour of these estimators in high...
Bayesian nonparametric estimation and consistency of mixed multinomial logit choice models
De Blasi, Pierpaolo; Lau, John W; 10.3150/09-BEJ233
2011-01-01
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different te...
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models
Fan, Jianqing; Song, Rui
2011-01-01
A variable screening procedure via correlation learning was proposed Fan and Lv (2008) to reduce dimensionality in sparse ultra-high dimensional models. Even when the true model is linear, the marginal regression can be highly nonlinear. To address this issue, we further extend the correlation learning to marginal nonparametric learning. Our nonparametric independence screening is called NIS, a specific member of the sure independence screening. Several closely related variable screening procedures are proposed. Under the nonparametric additive models, it is shown that under some mild technical conditions, the proposed independence screening methods enjoy a sure screening property. The extent to which the dimensionality can be reduced by independence screening is also explicitly quantified. As a methodological extension, an iterative nonparametric independence screening (INIS) is also proposed to enhance the finite sample performance for fitting sparse additive models. The simulation results and a real data a...
Nonparametric TOA estimators for low-resolution IR-UWB digital receiver
Institute of Scientific and Technical Information of China (English)
Yanlong Zhang; Weidong Chen
2015-01-01
Nonparametric time-of-arrival (TOA) estimators for im-pulse radio ultra-wideband (IR-UWB) signals are proposed. Non-parametric detection is obviously useful in situations where de-tailed information about the statistics of the noise is unavailable or not accurate. Such TOA estimators are obtained based on condi-tional statistical tests with only a symmetry distribution assumption on the noise probability density function. The nonparametric es-timators are attractive choices for low-resolution IR-UWB digital receivers which can be implemented by fast comparators or high sampling rate low resolution analog-to-digital converters (ADCs), in place of high sampling rate high resolution ADCs which may not be available in practice. Simulation results demonstrate that nonparametric TOA estimators provide more effective and robust performance than typical energy detection (ED) based estimators.
Nonparametric statistical tests for the continuous data: the basic concept and the practical use.
Nahm, Francis Sahngun
2016-02-01
Conventional statistical tests are usually called parametric tests. Parametric tests are used more frequently than nonparametric tests in many medical articles, because most of the medical researchers are familiar with and the statistical software packages strongly support parametric tests. Parametric tests require important assumption; assumption of normality which means that distribution of sample means is normally distributed. However, parametric test can be misleading when this assumption is not satisfied. In this circumstance, nonparametric tests are the alternative methods available, because they do not required the normality assumption. Nonparametric tests are the statistical methods based on signs and ranks. In this article, we will discuss about the basic concepts and practical use of nonparametric tests for the guide to the proper use.
Directory of Open Access Journals (Sweden)
SIAVASH KALBI
2014-05-01
Full Text Available Kalbi S, Fallah A, Hojjati SM. 2014. Using and comparing two nonparametric methods (CART and RF and SPOT-HRG satellite data to predictive tree diversity distribution. Nusantara Bioscience 6: 57-62. The prediction of spatial distributions of tree species by means of survey data has recently been used for conservation planning. Numerous methods have been developed for building species habitat suitability models. The present study was carried out to find the possible proper relationships between tree species diversity indices and SPOT-HRG reflectance values in Hyrcanian forests, North of Iran. Two different modeling techniques, Classification and Regression Trees (CART and Random Forest (RF, were fitted to the data in order to find the most successfully model. Simpson, Shannon diversity and the reciprocal of Simpson indices were used for estimating tree diversity. After collecting terrestrial information on trees in the 100 samples, the tree diversity indices were calculated in each plot. RF with determinate coefficient and RMSE from 56.3 to 63.9 and RMSE from 0.15 to 0.84 has better results than CART algorithms with determinate coefficient 42.3 to 63.3 and RMSE from 0.188 to 0.88. Overall the results showed that the SPOT-HRG satellite data and nonparametric regression could be useful for estimating tree diversity in Hyrcanian forests, North of Iran.
McCallum, James L.; Engdahl, Nicholas B.; Ginn, Timothy R.; Cook, Peter. G.
2014-03-01
Residence time distributions (RTDs) have been used extensively for quantifying flow and transport in subsurface hydrology. In geochemical approaches, environmental tracer concentrations are used in conjunction with simple lumped parameter models (LPMs). Conversely, numerical simulation techniques require large amounts of parameterization and estimated RTDs are certainly limited by associated uncertainties. In this study, we apply a nonparametric deconvolution approach to estimate RTDs using environmental tracer concentrations. The model is based only on the assumption that flow is steady enough that the observed concentrations are well approximated by linear superposition of the input concentrations with the RTD; that is, the convolution integral holds. Even with large amounts of environmental tracer concentration data, the entire shape of an RTD remains highly nonunique. However, accurate estimates of mean ages and in some cases prediction of young portions of the RTD may be possible. The most useful type of data was found to be the use of a time series of tritium. This was due to the sharp variations in atmospheric concentrations and a short half-life. Conversely, the use of CFC compounds with smoothly varying atmospheric concentrations was more prone to nonuniqueness. This work highlights the benefits and limitations of using environmental tracer data to estimate whole RTDs with either LPMs or through numerical simulation. However, the ability of the nonparametric approach developed here to correct for mixing biases in mean ages appears promising.
On The Robustness of z=0-1 Galaxy Size Measurements Through Model and Non-Parametric Fits
Mosleh, Moein; Franx, Marijn
2013-01-01
We present the size-stellar mass relations of nearby (z=0.01-0.02) SDSS galaxies, for samples selected by color, morphology, Sersic index n, and specific star formation rate. Several commonly-employed size measurement techniques are used, including single Sersic fits, two-component Sersic models and a non-parametric method. Through simple simulations we show that the non-parametric and two-component Sersic methods provide the most robust effective radius measurements, while those based on single Sersic profiles are often overestimates, especially for massive red/early-type galaxies. Using our robust sizes, we show that for all sub-samples, the mass-size relations are shallow at low stellar masses and steepen above ~3-4 x 10^{10}\\Msun. The mass-size relations for galaxies classified as late-type, low-n, and star-forming are consistent with each other, while blue galaxies follow a somewhat steeper relation. The mass-size relations of early-type, high-n, red, and quiescent galaxies all agree with each other but ...
Directory of Open Access Journals (Sweden)
SANGCHAN KANTABUTRA
2009-04-01
Full Text Available This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA. Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have access to and practice different production technologies than rural schools, and rural institutions appear to operate less efficiently than their urban counterparts. In addition, a sensitivity analysis, conducted to ascertain the robustness of the analytical framework, revealed the stability of urban-rural effects on school efficiency. Policy to improve school eff iciency should thus take varying geographical area differences into account, viewing rural and urban schools as different from one another. Moreover, policymakers might consider shifting existing resources from urban schools to rural schools, provided that the increase in overall rural efficiency would be greater than the decrease, if any, in the city. Future research directions are discussed.
Examples of the Application of Nonparametric Information Geometry to Statistical Physics
Directory of Open Access Journals (Sweden)
Giovanni Pistone
2013-09-01
Full Text Available We review a nonparametric version of Amari’s information geometry in which the set of positive probability densities on a given sample space is endowed with an atlas of charts to form a differentiable manifold modeled on Orlicz Banach spaces. This nonparametric setting is used to discuss the setting of typical problems in machine learning and statistical physics, such as black-box optimization, Kullback-Leibler divergence, Boltzmann-Gibbs entropy and the Boltzmann equation.
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2008-01-01
Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.
A robust nonparametric method for quantifying undetected extinctions.
Chisholm, Ryan A; Giam, Xingli; Sadanandan, Keren R; Fung, Tak; Rheindt, Frank E
2016-06-01
How many species have gone extinct in modern times before being described by science? To answer this question, and thereby get a full assessment of humanity's impact on biodiversity, statistical methods that quantify undetected extinctions are required. Such methods have been developed recently, but they are limited by their reliance on parametric assumptions; specifically, they assume the pools of extant and undetected species decay exponentially, whereas real detection rates vary temporally with survey effort and real extinction rates vary with the waxing and waning of threatening processes. We devised a new, nonparametric method for estimating undetected extinctions. As inputs, the method requires only the first and last date at which each species in an ensemble was recorded. As outputs, the method provides estimates of the proportion of species that have gone extinct, detected, or undetected and, in the special case where the number of undetected extant species in the present day is assumed close to zero, of the absolute number of undetected extinct species. The main assumption of the method is that the per-species extinction rate is independent of whether a species has been detected or not. We applied the method to the resident native bird fauna of Singapore. Of 195 recorded species, 58 (29.7%) have gone extinct in the last 200 years. Our method projected that an additional 9.6 species (95% CI 3.4, 19.8) have gone extinct without first being recorded, implying a true extinction rate of 33.0% (95% CI 31.0%, 36.2%). We provide R code for implementing our method. Because our method does not depend on strong assumptions, we expect it to be broadly useful for quantifying undetected extinctions. © 2016 Society for Conservation Biology.
Nonparametric Bayesian inference of the microcanonical stochastic block model
Peixoto, Tiago P.
2017-01-01
A principled approach to characterize the hidden modular structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.
Akhtar, Naveed; Mian, Ajmal
2017-10-03
We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Non-parametric combination and related permutation tests for neuroimaging.
Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E
2016-04-01
In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction.
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using
Yang, Hai; Wei, Qiang; Zhong, Xue; Yang, Hushan; Li, Bingshan
2017-02-15
Comprehensive catalogue of genes that drive tumor initiation and progression in cancer is key to advancing diagnostics, therapeutics and treatment. Given the complexity of cancer, the catalogue is far from complete yet. Increasing evidence shows that driver genes exhibit consistent aberration patterns across multiple-omics in tumors. In this study, we aim to leverage complementary information encoded in each of the omics data to identify novel driver genes through an integrative framework. Specifically, we integrated mutations, gene expression, DNA copy numbers, DNA methylation and protein abundance, all available in The Cancer Genome Atlas (TCGA) and developed iDriver, a non-parametric Bayesian framework based on multivariate statistical modeling to identify driver genes in an unsupervised fashion. iDriver captures the inherent clusters of gene aberrations and constructs the background distribution that is used to assess and calibrate the confidence of driver genes identified through multi-dimensional genomic data. We applied the method to 4 cancer types in TCGA and identified candidate driver genes that are highly enriched with known drivers. (e.g.: P < 3.40 × 10 -36 for breast cancer). We are particularly interested in novel genes and observed multiple lines of supporting evidence. Using systematic evaluation from multiple independent aspects, we identified 45 candidate driver genes that were not previously known across these 4 cancer types. The finding has important implications that integrating additional genomic data with multivariate statistics can help identify cancer drivers and guide the next stage of cancer genomics research. The C ++ source code is freely available at https://medschool.vanderbilt.edu/cgg/ . hai.yang@vanderbilt.edu or bingshan.li@Vanderbilt.Edu. Supplementary data are available at Bioinformatics online.
Traffic Sign Detection Based on Clustering and Chain Code Technique%基于聚类与链码技术的交通标志检测
Institute of Scientific and Technical Information of China (English)
林川; 潘盛辉; 谭光兴; 李梦和
2011-01-01
交通标志的有效检测是交通标志识别系统中的关键步骤;提出一种基于颜色和形状的交通标志检测新方法,首先由最小欧式距离的聚类分析方法以及特征量与聚类中心的向量积提取夹角正弦方法,构造两级颜色特征分类器并通过训练实现优化分割,然后对滤波后图像边界跟踪,利用新的链码方法实现区域的拐角点提取,最后由几何特征判定区域的形状进行定位,实现交通标志的检测;实验结果表明,该方法在不同气候条件下的平均检测率达92.96%,优于同类方法且具有较高的鲁棒性.%Detecting traffic sign effectively is a key technique in traffic sign recognition system. A method of traffic sign detection based on color and shape was presented. First, the method of clustering analysis of minimum Euclidian distance was used to construct the first color classifier. The sine value was derived from the cross product of the characteristic parameters and the cluster center, and it was presented for constructing the second color classifier. The segmentation of traffic sign is achieved by training the classifier. And then, the algorithm of edge tracing was used to the filtered image. A new method of chain code was presented to extract the corner points of the region. Finally, the shape of the region was identified by the geometric feature, and the detection results were improved by traffic sign position. Experimental result shows that the detection rate is 92. 96% in different climatic conditions; the method outperforms the previously existing method and has better robustness.
Nonparametric Comparison of Two Dynamic Parameter Setting Methods in a Meta-Heuristic Approach
Directory of Open Access Journals (Sweden)
Seyhun HEPDOGAN
2007-10-01
Full Text Available Meta-heuristics are commonly used to solve combinatorial problems in practice. Many approaches provide very good quality solutions in a short amount of computational time; however most meta-heuristics use parameters to tune the performance of the meta-heuristic for particular problems and the selection of these parameters before solving the problem can require much time. This paper investigates the problem of setting parameters using a typical meta-heuristic called Meta-RaPS (Metaheuristic for Randomized Priority Search.. Meta-RaPS is a promising meta-heuristic optimization method that has been applied to different types of combinatorial optimization problems and achieved very good performance compared to other meta-heuristic techniques. To solve a combinatorial problem, Meta-RaPS uses two well-defined stages at each iteration: construction and local search. After a number of iterations, the best solution is reported. Meta-RaPS performance depends on the fine tuning of two main parameters, priority percentage and restriction percentage, which are used during the construction stage. This paper presents two different dynamic parameter setting methods for Meta-RaPS. These dynamic parameter setting approaches tune the parameters while a solution is being found. To compare these two approaches, nonparametric statistic approaches are utilized since the solutions are not normally distributed. Results from both these dynamic parameter setting methods are reported.
Chang, Ju Yong
2016-08-01
We present a new gesture recognition method that is based on the conditional random field (CRF) model using multiple feature matching. Our approach solves the labeling problem, determining gesture categories and their temporal ranges at the same time. A generative probabilistic model is formalized and probability densities are nonparametrically estimated by matching input features with a training dataset. In addition to the conventional skeletal joint-based features, the appearance information near the active hand in an RGB image is exploited to capture the detailed motion of fingers. The estimated likelihood function is then used as the unary term for our CRF model. The smoothness term is also incorporated to enforce the temporal coherence of our solution. Frame-wise recognition results can then be obtained by applying an efficient dynamic programming technique. To estimate the parameters of the proposed CRF model, we incorporate the structured support vector machine (SSVM) framework that can perform efficient structured learning by using large-scale datasets. Experimental results demonstrate that our method provides effective gesture recognition results for challenging real gesture datasets. By scoring 0.8563 in the mean Jaccard index, our method has obtained the state-of-the-art results for the gesture recognition track of the 2014 ChaLearn Looking at People (LAP) Challenge.
Semi-nonparametric VaR forecasts for hedge funds during the recent crisis
Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier
2014-05-01
The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.
A Non-parametric Approach to the Overall Estimate of Cognitive Load Using NIRS Time Series.
Keshmiri, Soheil; Sumioka, Hidenobu; Yamazaki, Ryuji; Ishiguro, Hiroshi
2017-01-01
We present a non-parametric approach to prediction of the n-back n ∈ {1, 2} task as a proxy measure of mental workload using Near Infrared Spectroscopy (NIRS) data. In particular, we focus on measuring the mental workload through hemodynamic responses in the brain induced by these tasks, thereby realizing the potential that they can offer for their detection in real world scenarios (e.g., difficulty of a conversation). Our approach takes advantage of intrinsic linearity that is inherent in the components of the NIRS time series to adopt a one-step regression strategy. We demonstrate the correctness of our approach through its mathematical analysis. Furthermore, we study the performance of our model in an inter-subject setting in contrast with state-of-the-art techniques in the literature to show a significant improvement on prediction of these tasks (82.50 and 86.40% for female and male participants, respectively). Moreover, our empirical analysis suggest a gender difference effect on the performance of the classifiers (with male data exhibiting a higher non-linearity) along with the left-lateralized activation in both genders with higher specificity in females.
Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines
Kuter, S.; Akyürek, Z.; Weber, G.-W.
2016-10-01
Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th
Maximum-likelihood cluster recontruction
Bartelmann, M; Seitz, S; Schneider, P J; Bartelmann, Matthias; Narayan, Ramesh; Seitz, Stella; Schneider, Peter
1996-01-01
We present a novel method to recontruct the mass distribution of galaxy clusters from their gravitational lens effect on background galaxies. The method is based on a least-chisquare fit of the two-dimensional gravitational cluster potential. The method combines information from shear and magnification by the cluster lens and is designed to easily incorporate possible additional information. We describe the technique and demonstrate its feasibility with simulated data. Both the cluster morphology and the total cluster mass are well reproduced.
Institute of Scientific and Technical Information of China (English)
LIU Yong-jian; DUAN Chuan; TIAN Meng-liang; HU Er-liang; HUANG Yu-bi
2010-01-01
Analysis of multi-environment trials (METs) of crops for the evaluation and recommendation of varieties is an important issue in plant breeding research. Evaluating on the both stability of performance and high yield is essential in MET analyses. The objective of the present investigation was to compare 11 nonparametric stability statistics and apply nonparametric tests for genotype-by-environment interaction (GEI) to 14 maize (Zea mays L.) genotypes grown at 25 locations in southwestern China during 2005. Results of nonparametric tests of GEI and a combined ANOVA across locations showed that both crossover and noncrossover GEI, and genotypes varied highly significantly for yield. The results of principal component analysis, correlation analysis of nonparametric statistics, and yield indicated the nonparametric statistics grouped as four distinct classes that corresponded to different agronomic and biological concepts of stability.Furthermore, high values of TOP and low values of rank-sum were associated with high mean yield, but the other nonparametric statistics were not positively correlated with mean yield. Therefore, only rank-sum and TOP methods would be useful for simultaneously selection for high yield and stability. These two statistics recommended JY686 and HX 168 as desirable and ND 108, CM 12, CN36, and NK6661 as undesirable genotypes.
A novel nonparametric confidence interval for differences of proportions for correlated binary data.
Duan, Chongyang; Cao, Yingshu; Zhou, Lizhi; Tan, Ming T; Chen, Pingyan
2016-11-16
Various confidence interval estimators have been developed for differences in proportions resulted from correlated binary data. However, the width of the mostly recommended Tango's score confidence interval tends to be wide, and the computing burden of exact methods recommended for small-sample data is intensive. The recently proposed rank-based nonparametric method by treating proportion as special areas under receiver operating characteristic provided a new way to construct the confidence interval for proportion difference on paired data, while the complex computation limits its application in practice. In this article, we develop a new nonparametric method utilizing the U-statistics approach for comparing two or more correlated areas under receiver operating characteristics. The new confidence interval has a simple analytic form with a new estimate of the degrees of freedom of n - 1. It demonstrates good coverage properties and has shorter confidence interval widths than that of Tango. This new confidence interval with the new estimate of degrees of freedom also leads to coverage probabilities that are an improvement on the rank-based nonparametric confidence interval. Comparing with the approximate exact unconditional method, the nonparametric confidence interval demonstrates good coverage properties even in small samples, and yet they are very easy to implement computationally. This nonparametric procedure is evaluated using simulation studies and illustrated with three real examples. The simplified nonparametric confidence interval is an appealing choice in practice for its ease of use and good performance. © The Author(s) 2016.
An Evaluation of Parametric and Nonparametric Models of Fish Population Response.
Energy Technology Data Exchange (ETDEWEB)
Haas, Timothy C.; Peterson, James T.; Lee, Danny C.
1999-11-01
Predicting the distribution or status of animal populations at large scales often requires the use of broad-scale information describing landforms, climate, vegetation, etc. These data, however, often consist of mixtures of continuous and categorical covariates and nonmultiplicative interactions among covariates, complicating statistical analyses. Using data from the interior Columbia River Basin, USA, we compared four methods for predicting the distribution of seven salmonid taxa using landscape information. Subwatersheds (mean size, 7800 ha) were characterized using a set of 12 covariates describing physiography, vegetation, and current land-use. The techniques included generalized logit modeling, classification trees, a nearest neighbor technique, and a modular neural network. We evaluated model performance using out-of-sample prediction accuracy via leave-one-out cross-validation and introduce a computer-intensive Monte Carlo hypothesis testing approach for examining the statistical significance of landscape covariates with the non-parametric methods. We found the modular neural network and the nearest-neighbor techniques to be the most accurate, but were difficult to summarize in ways that provided ecological insight. The modular neural network also required the most extensive computer resources for model fitting and hypothesis testing. The generalized logit models were readily interpretable, but were the least accurate, possibly due to nonlinear relationships and nonmultiplicative interactions among covariates. Substantial overlap among the statistically significant (P<0.05) covariates for each method suggested that each is capable of detecting similar relationships between responses and covariates. Consequently, we believe that employing one or more methods may provide greater biological insight without sacrificing prediction accuracy.
Unconventional methods for clustering
Kotyrba, Martin
2016-06-01
Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.
Contreras-Reyes, Javier E
2012-01-01
We study the spatial distribution of clusters associated to the aftershocks of the megathrust Maule earthquake MW 8.8 of 27 February 2010. We used a recent clustering method which hinges on a nonparametric estimation of the underlying probability density function to detect subsets of points forming clusters associated to high density areas. In addition, we estimate the probability density function using a nonparametric kernel method for each of these clusters. This allow us to identify a set of regions where there is an association between frequency of events and pre-seismic locking. Specifically, our results suggest that high coseismic slip spatially correlates with high aftershock frequency.
Energy Technology Data Exchange (ETDEWEB)
Peterson, James T.
1999-12-01
Natural resource professionals are increasingly required to develop rigorous statistical models that relate environmental data to categorical responses data. Recent advances in the statistical and computing sciences have led to the development of sophisticated methods for parametric and nonparametric analysis of data with categorical responses. The statistical software package CATDAT was designed to make some of these relatively new and powerful techniques available to scientists. The CATDAT statistical package includes 4 analytical techniques: generalized logit modeling; binary classification tree; extended K-nearest neighbor classification; and modular neural network.
Lu, Tao; Liang, Hua; Li, Hongzhe; Wu, Hulin
2011-01-01
Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. In this article we propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed, that include non-parametric mixed-effects models with a mixture distribution for clustering, nonparametric mixed-effects smoothing-based methods for ODE models, the smoothly clipped absolute deviation (SCAD)-based variable selection, and stochastic approximation EM (SAEM) approach for mixed-effects ODE model parameter estimation. The key step, the SCAD-based variable selection of the proposed procedure is justified by investigating its asymptotic properties and validated by Monte Carlo simulations. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks.
Measuring the influence of networks on transaction costs using a non-parametric regression technique
DEFF Research Database (Denmark)
Henningsen, Géraldine; Henningsen, Arne; Henning, Christian H.C.A.
All business transactions as well as achieving innovations take up resources, subsumed under the concept of transaction costs. One of the major factors in transaction costs theory is information. Firm networks can catalyse the interpersonal information exchange and hence, increase the access to n...
A Multilevel Decomposition of School Performance Using Robust Nonparametric Frontier Techniques
Thieme, Claudio; Prior, Diego; Tortosa-Ausina, Emili
2013-01-01
We propose a methodology for evaluating educational performance, from a multilevel perspective. We use partial frontier approaches to mitigate the influence of outliers and the curse of dimensionality. Our estimation considers idiosyncratic variables at the school, class, and student levels. Our model is applied to a sample of students in fourth…
A Multilevel Decomposition of School Performance Using Robust Nonparametric Frontier Techniques
Thieme, Claudio; Prior, Diego; Tortosa-Ausina, Emili
2013-01-01
We propose a methodology for evaluating educational performance, from a multilevel perspective. We use partial frontier approaches to mitigate the influence of outliers and the curse of dimensionality. Our estimation considers idiosyncratic variables at the school, class, and student levels. Our model is applied to a sample of students in fourth…
Gianola, Daniel; Wu, Xiao-Lin; Manfredi, Eduardo; Simianer, Henner
2010-10-01
A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some "baseline" family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a
Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions
Poczos, Barnabas; Schneider, Jeff
2012-01-01
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.
Online Nonparametric Bayesian Activity Mining and Analysis From Surveillance Video.
Bastani, Vahid; Marcenaro, Lucio; Regazzoni, Carlo S
2016-05-01
A method for online incremental mining of activity patterns from the surveillance video stream is presented in this paper. The framework consists of a learning block in which Dirichlet process mixture model is employed for the incremental clustering of trajectories. Stochastic trajectory pattern models are formed using the Gaussian process regression of the corresponding flow functions. Moreover, a sequential Monte Carlo method based on Rao-Blackwellized particle filter is proposed for tracking and online classification as well as the detection of abnormality during the observation of an object. Experimental results on real surveillance video data are provided to show the performance of the proposed algorithm in different tasks of trajectory clustering, classification, and abnormality detection.
Xiao, Qijun
This thesis discusses the full scope of a project exploring the physics of hierarchical clusters of interacting nanomagnets. These clusters may be relevant for novel applications such as multilevel data storage devices. The work can be grouped into three main activities: micromagnetic simulation, fabrication and characterization of proof-of-concept prototype devices, and efforts to scale down the structures by creating the hierarchical structures with the aid of diblock copolymer self assembly. Theoretical micromagnetic studies and simulations based on Landau-Lifshitz-Gilbert (LLG) equation were conducted on nanoscale single domain magnetic entities. For the simulated nanomagnet clusters with perpendicular uniaxial anisotropy, the simulation showed the switching field distributions, the stability of the magnetostatic states with distinctive total cluster perpendicular moments, and the stepwise magnetic switching curves. For simulated nanomagnet clusters with in-plane shape anisotropy, the simulation showed the stepwise switching behaviors governed by thermal agitation and cluster configurations. Proof-of-concept cluster devices with three interacting Co nanomagnets were fabricated by e-beam lithography (EBL) and pulse-reverse electrochemical deposition (PRECD). EBL patterning on a suspended 100 nm SiN membrane showed improved lateral lithography resolution to 30 nm. The Co nanomagnets deposited using the PRECD method showed perpendicular anisotropy. The switching experiments with external applied fields were able to switch the Co nanomagnets through the four magnetostatic states with distinctive total perpendicular cluster magnetization, and proved the feasibility of multilevel data storage devices based on the cluster concept. Shrinking the structures size was experimented by the aid of diblock copolymer. Thick poly(styrene)-b-poly(methyl methacrylate) (PS-b-PMMA) diblock copolymer templates aligned with external electrical field were used to fabricate long Ni
Institute of Scientific and Technical Information of China (English)
张斌; 庄池杰; 胡军; 陈水明; 张明明; 王科; 曾嵘
2015-01-01
large datasets clustering. Various techniques for reducing the dimension of the input datasets were studied and the results were compared from perspectives of computing time and information losses. The results indicate that the combination of principal component analysis and ensemble clustering algorithm performs better both in efficiency and accuracy for clustering large-scale load profiles.
Energy Technology Data Exchange (ETDEWEB)
Constantinescu, C C; Yoder, K K; Normandin, M D; Morris, E D [Department of Radiology, Indiana University School of Medicine, Indianapolis, IN (United States); Kareken, D A [Department of Neurology, Indiana University School of Medicine, Indianapolis, IN (United States); Bouman, C A [Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN (United States); O' Connor, S J [Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN (United States)], E-mail: emorris@iupui.edu
2008-03-07
We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest and activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (F{sup DA}(t)) and the change in binding potential ({delta}BP). The veracity of the F{sup DA}(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) {delta}BP should decline with increasing DA peak time, (2) {delta}BP should increase as the strength of the temporal correlation between F{sup DA}(t) and the free raclopride (F{sup RAC}(t)) curve increases, (3) {delta}BP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [{sup 11}C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover F{sup DA}(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the F{sup DA}(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of F{sup DA}(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.
Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache; Vascular headache - cluster ... A cluster headache begins as a severe, sudden headache. The headache commonly strikes 2 to 3 hours after you fall ...
Yan, Donghui; Jordan, Michael I
2011-01-01
Inspired by Random Forests (RF) in the context of classification, we propose a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure $\\kappa$. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis shows that the $\\kappa$ criterion is shown to grow each local clustering in a desirable way---it is "noise-resistant." A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.
Gieles, M.
1993-01-01
Star clusters are observed in almost every galaxy. In this thesis we address several fundamental problems concerning the formation, evolution and disruption of star clusters. From observations of (young) star clusters in the interacting galaxy M51, we found that clusters are formed in complexes of stars and star clusters. These complexes share similar properties with giant molecular clouds, from which they are formed. Many (70%) of the young clusters will not survive the fist 10 Myr, due to t...
Ford, Eric B; Steffen, Jason H; Carter, Joshua A; Fressin, Francois; Holman, Matthew J; Lissauer, Jack J; Moorhead, Althea V; Morehead, Robert C; Ragozzine, Darin; Rowe, Jason F; Welsh, William F; Allen, Christopher; Batalha, Natalie M; Borucki, William J; Bryson, Stephen T; Buchhave, Lars A; Burke, Christopher J; Caldwell, Douglas A; Charbonneau, David; Clarke, Bruce D; Cochran, William D; Désert, Jean-Michel; Endl, Michael; Everett, Mark E; Fischer, Debra A; Gautier, Thomas N; Gilliland, Ron L; Jenkins, Jon M; Haas, Michael R; Horch, Elliott; Howell, Steve B; Ibrahim, Khadeejah A; Isaacson, Howard; Koch, David G; Latham, David W; Li, Jie; Lucas, Philip; MacQueen, Phillip J; Marcy, Geoffrey W; McCauliff, Sean; Mullally, Fergal R; Quinn, Samuel N; Quintana, Elisa; Shporer, Avi; Still, Martin; Tenenbaum, Peter; Thompson, Susan E; Torres, Guillermo; Twicken, Joseph D; Wohler, Bill
2012-01-01
We present a new method for confirming transiting planets based on the combination of transit timingn variations (TTVs) and dynamical stability. Correlated TTVs provide evidence that the pair of bodies are in the same physical system. Orbital stability provides upper limits for the masses of the transiting companions that are in the planetary regime. This paper describes a non-parametric technique for quantifying the statistical significance of TTVs based on the correlation of two TTV data sets. We apply this method to an analysis of the transit timing variations of two stars with multiple transiting planet candidates identified by Kepler. We confirm four transiting planets in two multiple planet systems based on their TTVs and the constraints imposed by dynamical stability. An additional three candidates in these same systems are not confirmed as planets, but are likely to be validated as real planets once further observations and analyses are possible. If all were confirmed, these systems would be near 4:6:...
Li, Xiao-Dong; Lv, Mang-Mang; Ho, John K. L.
2016-07-01
In this article, two adaptive iterative learning control (ILC) algorithms are presented for nonlinear continuous systems with non-parametric uncertainties. Unlike general ILC techniques, the proposed adaptive ILC algorithms allow that both the initial error at each iteration and the reference trajectory are iteration-varying in the ILC process, and can achieve non-repetitive trajectory tracking beyond a small initial time interval. Compared to the neural network or fuzzy system-based adaptive ILC schemes and the classical ILC methods, in which the number of iterative variables is generally larger than or equal to the number of control inputs, the first adaptive ILC algorithm proposed in this paper uses just two iterative variables, while the second even uses a single iterative variable provided that some bound information on system dynamics is known. As a result, the memory space in real-time ILC implementations is greatly reduced.
Danielski, C; Tinetti, G
2013-01-01
The NASA Kepler mission is delivering groundbreaking results, with an increasing number of Earth-sized and moon-sized objects been discovered. A high photometric precision can be reached only through a thorough removal of the stellar activity and the instrumental systematics. We have explored here the possibility of using non-parametric methods to analyse the Simple Aperture Photometry data observed by the Kepler mission. We focused on a sample of stellar light curves with different effective temperatures and flux modulations, and we found that Gaussian Processes-based techniques can very effectively correct the instrumental systematics along with the long-term stellar activity. Our method can disentangle astrophysical features (events), such as planetary transits, flares or general sudden variations in the intensity, from the star signal and it is very efficient as it requires only a few training iterations of the Gaussian Process model. The results obtained show the potential of our method to isolate the ma...
Liu, Yi-Rong; Wen, Hui; Huang, Teng; Lin, Xiao-Xiao; Gai, Yan-Bo; Hu, Chang-Jin; Zhang, Wei-Jun; Huang, Wei
2014-01-16
Exploration of the low-lying structures of atomic or molecular clusters remains a fundamental problem in nanocluster science. Basin hopping is typically employed in conjunction with random motion, which is a perturbation of a local minimum structure. We have combined two different sampling technologies, "random sampling" and "compressed sampling", to explore the potential energy surface of molecular clusters. We used the method to study water, nitrate/water, and oxalate/water cluster systems at the MP2/aug-cc-pVDZ level of theory. An isomer of the NO3(-)(H2O)3 cluster molecule with a 3D structure was lower in energy than the planar structure, which had previously been reported by experimental study as the lowest-energy structure. The lowest-energy structures of the NO3(-)(H2O)5 and NO3(-)(H2O)7 clusters were found to have structures similar to pure (H2O)8 and (H2O)10 clusters, which contradicts previous experimental result by Wang et al.(J. Chem. Phys. 2002, 116, 561-570). The new minimum energy structures for C2O4(2-)(H2O)5 and C2O4(2-)(H2O)6 are found by our calculations.
Multilevel Latent Class Analysis: Parametric and Nonparametric Models
Finch, W. Holmes; French, Brian F.
2014-01-01
Latent class analysis is an analytic technique often used in educational and psychological research to identify meaningful groups of individuals within a larger heterogeneous population based on a set of variables. This technique is flexible, encompassing not only a static set of variables but also longitudinal data in the form of growth mixture…
A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator
Energy Technology Data Exchange (ETDEWEB)
Huang, Shengzhi; Huang, Qiang; Leng, Guoyong; Chang, Jianxia
2016-06-01
This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar to Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.
Geographic Projection of Cluster Composites
Nerbonne, J.; Bosveld-de Smet, L.M.; Kleiweg, P.; Blackwell, A.; Marriott, K.; Shimojima, A.
2004-01-01
A composite cluster map displays a fuzzy categorisation of geographic areas. It combines information from several sources to provide a visualisation of the significance of cluster borders. The basic technique renders the chance that two neighbouring locations are members of different clusters as the
DEFF Research Database (Denmark)
Ramirez, José Rangel; Sørensen, John Dalsgaard
2011-01-01
This work illustrates the updating and incorporation of information in the assessment of fatigue reliability for offshore wind turbine. The new information, coming from external and condition monitoring can be used to direct updating of the stochastic variables through a non-parametric Bayesian...... updating approach and be integrated in the reliability analysis by a third-order polynomial chaos expansion approximation. Although Classical Bayesian updating approaches are often used because of its parametric formulation, non-parametric approaches are better alternatives for multi-parametric updating...... with a non-conjugating formulation. The results in this paper show the influence on the time dependent updated reliability when non-parametric and classical Bayesian approaches are used. Further, the influence on the reliability of the number of updated parameters is illustrated....
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Non-parametric seismic hazard analysis in the presence of incomplete data
Yazdani, Azad; Mirzaei, Sajjad; Dadkhah, Koroush
2017-01-01
The distribution of earthquake magnitudes plays a crucial role in the estimation of seismic hazard parameters. Due to the complexity of earthquake magnitude distribution, non-parametric approaches are recommended over classical parametric methods. The main deficiency of the non-parametric approach is the lack of complete magnitude data in almost all cases. This study aims to introduce an imputation procedure for completing earthquake catalog data that will allow the catalog to be used for non-parametric density estimation. Using a Monte Carlo simulation, the efficiency of introduced approach is investigated. This study indicates that when a magnitude catalog is incomplete, the imputation procedure can provide an appropriate tool for seismic hazard assessment. As an illustration, the imputation procedure was applied to estimate earthquake magnitude distribution in Tehran, the capital city of Iran.
Revisiting the Distance Duality Relation using a non-parametric regression method
Rana, Akshay; Jain, Deepak; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, DL and angular diameter distance, DA given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η(z)≡ DL/DA (1+z)2 =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and works with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η (z) data based on a phenomenological model η(z)= (1+z)epsilon. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η (z). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z <= 2.418).
Directory of Open Access Journals (Sweden)
Jinn-Min Yang
2016-11-01
Full Text Available Feature extraction (FE or dimensionality reduction (DR plays quite an important role in the field of pattern recognition. Feature extraction aims to reduce the dimensionality of the high-dimensional dataset to enhance the classification accuracy and foster the classification speed, particularly when the training sample size is small, namely the small sample size (SSS problem. Remotely sensed hyperspectral images (HSIs are often with hundreds of measured features (bands which potentially provides more accurate and detailed information for classification, but it generally needs more samples to estimate parameters to achieve a satisfactory result. The cost of collecting ground-truth of remotely sensed hyperspectral scene can be considerably difficult and expensive. Therefore, FE techniques have been an important part for hyperspectral image classification. Unlike lots of feature extraction methods are based only on the spectral (band information of the training samples, some feature extraction methods integrating both spatial and spectral information of training samples show more effective results in recent years. Spatial contexture information has been proven to be useful to improve the HSI data representation and to increase classification accuracy. In this paper, we propose a spatial and spectral nonparametric linear feature extraction method for hyperspectral image classification. The spatial and spectral information is extracted for each training sample and used to design the within-class and between-class scatter matrices for constructing the feature extraction model. The experimental results on one benchmark hyperspectral image demonstrate that the proposed method obtains stable and satisfactory results than some existing spectral-based feature extraction.
Data clustering algorithms and applications
Aggarwal, Charu C
2013-01-01
Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea
Directory of Open Access Journals (Sweden)
Dmitri A. Viattchenin
2009-06-01
Full Text Available This paper describes a modification of a possibilistic clustering method based on the concept of allotment among fuzzy clusters. Basic ideas of the method are considered and the concept of a principal allotment among fuzzy clusters is introduced. The paper provides the description of the plan of the algorithm for detection principal allotment. An analysis of experimental results of the proposed algorithm’s application to the Tamura’s portrait data in comparison with the basic version of the algorithm and with the NERFCM-algorithm is carried out. A methodology of the algorithm’s application to the dimensionality reduction problem is outlined and the application of the methodology is illustrated on the example of Anderson’s Iris data in comparison with the result of principal component analysis. Preliminary conclusions are formulated also.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
DEFF Research Database (Denmark)
Ackerman, Margareta; Ben-David, Shai; Branzei, Simina
2012-01-01
We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...
Modern nonparametric, robust and multivariate methods festschrift in honour of Hannu Oja
Taskinen, Sara
2015-01-01
Written by leading experts in the field, this edited volume brings together the latest findings in the area of nonparametric, robust and multivariate statistical methods. The individual contributions cover a wide variety of topics ranging from univariate nonparametric methods to robust methods for complex data structures. Some examples from statistical signal processing are also given. The volume is dedicated to Hannu Oja on the occasion of his 65th birthday and is intended for researchers as well as PhD students with a good knowledge of statistics.
Directory of Open Access Journals (Sweden)
Rabia Ece OMAY
2013-06-01
Full Text Available In this study, relationship between gross domestic product (GDP per capita and sulfur dioxide (SO2 and particulate matter (PM10 per capita is modeled for Turkey. Nonparametric fixed effect panel data analysis is used for the modeling. The panel data covers 12 territories, in first level of Nomenclature of Territorial Units for Statistics (NUTS, for period of 1990-2001. Modeling of the relationship between GDP and SO2 and PM10 for Turkey, the non-parametric models have given good results.
Zhao, Zhibiao
2011-06-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Parametrically guided estimation in nonparametric varying coefficient models with quasi-likelihood.
Davenport, Clemontina A; Maity, Arnab; Wu, Yichao
2015-04-01
Varying coefficient models allow us to generalize standard linear regression models to incorporate complex covariate effects by modeling the regression coefficients as functions of another covariate. For nonparametric varying coefficients, we can borrow the idea of parametrically guided estimation to improve asymptotic bias. In this paper, we develop a guided estimation procedure for the nonparametric varying coefficient models. Asymptotic properties are established for the guided estimators and a method of bandwidth selection via bias-variance tradeoff is proposed. We compare the performance of the guided estimator with that of the unguided estimator via both simulation and real data examples.
A Non-Parametric and Entropy Based Analysis of the Relationship between the VIX and S&P 500
Directory of Open Access Journals (Sweden)
Abhay K. Singh
2013-10-01
Full Text Available This paper features an analysis of the relationship between the S&P 500 Index and the VIX using daily data obtained from the CBOE website and SIRCA (The Securities Industry Research Centre of the Asia Pacific. We explore the relationship between the S&P 500 daily return series and a similar series for the VIX in terms of a long sample drawn from the CBOE from 1990 to mid 2011 and a set of returns from SIRCA’s TRTH datasets from March 2005 to-date. This shorter sample, which captures the behavior of the new VIX, introduced in 2003, is divided into four sub-samples which permit the exploration of the impact of the Global Financial Crisis. We apply a series of non-parametric based tests utilizing entropy based metrics. These suggest that the PDFs and CDFs of these two return distributions change shape in various subsample periods. The entropy and MI statistics suggest that the degree of uncertainty attached to these distributions changes through time and using the S&P 500 return as the dependent variable, that the amount of information obtained from the VIX changes with time and reaches a relative maximum in the most recent period from 2011 to 2012. The entropy based non-parametric tests of the equivalence of the two distributions and their symmetry all strongly reject their respective nulls. The results suggest that parametric techniques do not adequately capture the complexities displayed in the behavior of these series. This has practical implications for hedging utilizing derivatives written on the VIX.
Sánchez-Portal, Miguel; Pérez-Martínez, Ricardo; Cepa, Jordi; García, Ana M Pérez; Domínguez-Sánchez, Helena; Bongiovanni, Ángel; Serra, Ana L; Alfaro, Emilio; Altieri, Bruno; Aragón-Salamanca, Alfonso; Balkowski, Chantal; Biviano, Andrea; Bremer, Malcom; Castander, Francisco; Castañeda, Héctor; Castro-Rodríguez, Nieves; Chies-Santos, Ana L; Coia, Daniela; Diaferio, Antonaldo; Duc, Pierre-Alain; Ederoclite, Alessandro; Geach, James; González-Serrano, Ignacio; Haines, Chris P; McBreen, Brian; Metcalfe, Leo; Oteo, Iván; Pérez-Fournón, Ismael; Poggianti, Bianca; Polednikova, Jana; Ramón-Pérez, Marina; Rodríguez-Espinosa, José M; Santos, Joana S; Smail, Ian; Smith, Graham P; Temporin, Sonia; Valtchanov, Ivan
2015-01-01
The cores of clusters at 0 $\\lesssim$ z $\\lesssim$ 1 are dominated by quiescent early-type galaxies, whereas the field is dominated by star-forming late-type ones. Galaxy properties, notably the star formation (SF) ability, are altered as they fall into overdense regions. The critical issues to understand this evolution are how the truncation of SF is connected to the morphological transformation and the responsible physical mechanism. The GaLAxy Cluster Evolution Survey (GLACE) is conducting a study on the variation of galaxy properties (SF, AGN, morphology) as a function of environment in a representative sample of clusters. A deep survey of emission line galaxies (ELG) is being performed, mapping a set of optical lines ([OII], [OIII], H$\\beta$ and H$\\alpha$/[NII]) in several clusters at z $\\sim$ 0.40, 0.63 and 0.86. Using the Tunable Filters (TF) of OSIRIS/GTC, GLACE applies the technique of TF tomography: for each line, a set of images at different wavelengths are taken through the TF, to cover a rest fra...
Rights, Jason D; Sterba, Sonya K
2016-11-01
Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any
Jiang, GJ; Knight, JL
1997-01-01
In this paper, we propose a nonparametric identification and estimation procedure for an Ito diffusion process based on discrete sampling observations. The nonparametric kernel estimator for the diffusion function developed in this paper deals with general Ito diffusion processes and avoids any func
Neutrosophic Hierarchical Clustering Algoritms
Directory of Open Access Journals (Sweden)
Rıdvan Şahin
2014-03-01
Full Text Available Interval neutrosophic set (INS is a generalization of interval valued intuitionistic fuzzy set (IVIFS, whose the membership and non-membership values of elements consist of fuzzy range, while single valued neutrosophic set (SVNS is regarded as extension of intuitionistic fuzzy set (IFS. In this paper, we extend the hierarchical clustering techniques proposed for IFSs and IVIFSs to SVNSs and INSs respectively. Based on the traditional hierarchical clustering procedure, the single valued neutrosophic aggregation operator, and the basic distance measures between SVNSs, we define a single valued neutrosophic hierarchical clustering algorithm for clustering SVNSs. Then we extend the algorithm to classify an interval neutrosophic data. Finally, we present some numerical examples in order to show the effectiveness and availability of the developed clustering algorithms.
Nonparametric estimation of population density for line transect sampling using FOURIER series
Crain, B.R.; Burnham, K.P.; Anderson, D.R.; Lake, J.L.
1979-01-01
A nonparametric, robust density estimation method is explored for the analysis of right-angle distances from a transect line to the objects sighted. The method is based on the FOURIER series expansion of a probability density function over an interval. With only mild assumptions, a general population density estimator of wide applicability is obtained.
A non-parametric peak finder algorithm and its application in searches for new physics
Chekanov, S
2011-01-01
We have developed an algorithm for non-parametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using pp Monte Carlo simulations.
Nonparametric estimation of the stationary M/G/1 workload distribution function
DEFF Research Database (Denmark)
Hansen, Martin Bøgsted
2005-01-01
In this paper it is demonstrated how a nonparametric estimator of the stationary workload distribution function of the M/G/1-queue can be obtained by systematic sampling the workload process. Weak convergence results and bootstrap methods for empirical distribution functions for stationary associ...