WorldWideScience

Sample records for statistical sampling algorithm

  1. Adaptive sampling algorithm for detection of superpoints

    Institute of Scientific and Technical Information of China (English)

    CHENG Guang; GONG Jian; DING Wei; WU Hua; QIANG ShiQiang

    2008-01-01

    The superpoints are the sources (or the destinations) that connect with a great deal of destinations (or sources) during a measurement time interval, so detecting the superpoints in real time is very important to network security and management. Previous algorithms are not able to control the usage of the memory and to deliver the desired accuracy, so it is hard to detect the superpoints on a high speed link in real time. In this paper, we propose an adaptive sampling algorithm to detect the superpoints in real time, which uses a flow sample and hold module to reduce the detection of the non-superpoints and to improve the measurement accuracy of the superpoints. We also design a data stream structure to maintain the flow records, which compensates for the flow Hash collisions statistically. An adaptive process based on different sampling probabilities is used to maintain the recorded IP ad dresses in the limited memory. This algorithm is compared with the other algo rithms by analyzing the real network trace data. Experiment results and mathematic analysis show that this algorithm has the advantages of both the limited memory requirement and high measurement accuracy.

  2. Performance comparison between total variation (TV)-based compressed sensing and statistical iterative reconstruction algorithms

    International Nuclear Information System (INIS)

    Tang Jie; Nett, Brian E; Chen Guanghong

    2009-01-01

    Of all available reconstruction methods, statistical iterative reconstruction algorithms appear particularly promising since they enable accurate physical noise modeling. The newly developed compressive sampling/compressed sensing (CS) algorithm has shown the potential to accurately reconstruct images from highly undersampled data. The CS algorithm can be implemented in the statistical reconstruction framework as well. In this study, we compared the performance of two standard statistical reconstruction algorithms (penalized weighted least squares and q-GGMRF) to the CS algorithm. In assessing the image quality using these iterative reconstructions, it is critical to utilize realistic background anatomy as the reconstruction results are object dependent. A cadaver head was scanned on a Varian Trilogy system at different dose levels. Several figures of merit including the relative root mean square error and a quality factor which accounts for the noise performance and the spatial resolution were introduced to objectively evaluate reconstruction performance. A comparison is presented between the three algorithms for a constant undersampling factor comparing different algorithms at several dose levels. To facilitate this comparison, the original CS method was formulated in the framework of the statistical image reconstruction algorithms. Important conclusions of the measurements from our studies are that (1) for realistic neuro-anatomy, over 100 projections are required to avoid streak artifacts in the reconstructed images even with CS reconstruction, (2) regardless of the algorithm employed, it is beneficial to distribute the total dose to more views as long as each view remains quantum noise limited and (3) the total variation-based CS method is not appropriate for very low dose levels because while it can mitigate streaking artifacts, the images exhibit patchy behavior, which is potentially harmful for medical diagnosis.

  3. A sampling algorithm for segregation analysis

    Directory of Open Access Journals (Sweden)

    Henshall John

    2001-11-01

    Full Text Available Abstract Methods for detecting Quantitative Trait Loci (QTL without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated.

  4. Monte Carlo molecular simulations: improving the statistical efficiency of samples with the help of artificial evolution algorithms; Simulations moleculaires de Monte Carlo: amelioration de l'efficacite statistique de l'echantillonnage grace aux algorithmes d'evolution artificielle

    Energy Technology Data Exchange (ETDEWEB)

    Leblanc, B.

    2002-03-01

    Molecular simulation aims at simulating particles in interaction, describing a physico-chemical system. When considering Markov Chain Monte Carlo sampling in this context, we often meet the same problem of statistical efficiency as with Molecular Dynamics for the simulation of complex molecules (polymers for example). The search for a correct sampling of the space of possible configurations with respect to the Boltzmann-Gibbs distribution is directly related to the statistical efficiency of such algorithms (i.e. the ability of rapidly providing uncorrelated states covering all the configuration space). We investigated how to improve this efficiency with the help of Artificial Evolution (AE). AE algorithms form a class of stochastic optimization algorithms inspired by Darwinian evolution. Efficiency measures that can be turned into efficiency criteria have been first searched before identifying parameters that could be optimized. Relative frequencies for each type of Monte Carlo moves, usually empirically chosen in reasonable ranges, were first considered. We combined parallel simulations with a 'genetic server' in order to dynamically improve the quality of the sampling during the simulations progress. Our results shows that in comparison with some reference settings, it is possible to improve the quality of samples with respect to the chosen criterion. The same algorithm has been applied to improve the Parallel Tempering technique, in order to optimize in the same time the relative frequencies of Monte Carlo moves and the relative frequencies of swapping between sub-systems simulated at different temperatures. Finally, hints for further research in order to optimize the choice of additional temperatures are given. (author)

  5. Classical boson sampling algorithms with superior performance to near-term experiments

    Science.gov (United States)

    Neville, Alex; Sparrow, Chris; Clifford, Raphaël; Johnston, Eric; Birchall, Patrick M.; Montanaro, Ashley; Laing, Anthony

    2017-12-01

    It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.

  6. RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

    Science.gov (United States)

    Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

    2017-06-06

    An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

  7. Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis

    Energy Technology Data Exchange (ETDEWEB)

    Vrbik, Jan [Department of Mathematics, Brock University, St. Catharines, Ontario L2S 3A1 (Canada); Ospadov, Egor; Rothstein, Stuart M., E-mail: srothstein@brocku.ca [Department of Physics, Brock University, St. Catharines, Ontario L2S 3A1 (Canada)

    2016-07-14

    Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x′; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.

  8. Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis

    International Nuclear Information System (INIS)

    Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M.

    2016-01-01

    Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x′; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.

  9. Comparison of single distance phase retrieval algorithms by considering different object composition and the effect of statistical and structural noise.

    Science.gov (United States)

    Chen, R C; Rigon, L; Longo, R

    2013-03-25

    Phase retrieval is a technique for extracting quantitative phase information from X-ray propagation-based phase-contrast tomography (PPCT). In this paper, the performance of different single distance phase retrieval algorithms will be investigated. The algorithms are herein called phase-attenuation duality Born Algorithm (PAD-BA), phase-attenuation duality Rytov Algorithm (PAD-RA), phase-attenuation duality Modified Bronnikov Algorithm (PAD-MBA), phase-attenuation duality Paganin algorithm (PAD-PA) and phase-attenuation duality Wu Algorithm (PAD-WA), respectively. They are all based on phase-attenuation duality property and on weak absorption of the sample and they employ only a single distance PPCT data. In this paper, they are investigated via simulated noise-free PPCT data considering the fulfillment of PAD property and weakly absorbing conditions, and with experimental PPCT data of a mixture sample containing absorbing and weakly absorbing materials, and of a polymer sample considering different degrees of statistical and structural noise. The simulation shows all algorithms can quantitatively reconstruct the 3D refractive index of a quasi-homogeneous weakly absorbing object from noise-free PPCT data. When the weakly absorbing condition is violated, the PAD-RA and PAD-PA/WA obtain better result than PAD-BA and PAD-MBA that are shown in both simulation and mixture sample results. When considering the statistical noise, the contrast-to-noise ratio values decreases as the photon number is reduced. The structural noise study shows that the result is progressively corrupted by ring-like artifacts with the increase of structural noise (i.e. phantom thickness). The PAD-RA and PAD-PA/WA gain better density resolution than the PAD-BA and PAD-MBA in both statistical and structural noise study.

  10. Enhanced sampling algorithms.

    Science.gov (United States)

    Mitsutake, Ayori; Mori, Yoshiharu; Okamoto, Yuko

    2013-01-01

    In biomolecular systems (especially all-atom models) with many degrees of freedom such as proteins and nucleic acids, there exist an astronomically large number of local-minimum-energy states. Conventional simulations in the canonical ensemble are of little use, because they tend to get trapped in states of these energy local minima. Enhanced conformational sampling techniques are thus in great demand. A simulation in generalized ensemble performs a random walk in potential energy space and can overcome this difficulty. From only one simulation run, one can obtain canonical-ensemble averages of physical quantities as functions of temperature by the single-histogram and/or multiple-histogram reweighting techniques. In this article we review uses of the generalized-ensemble algorithms in biomolecular systems. Three well-known methods, namely, multicanonical algorithm, simulated tempering, and replica-exchange method, are described first. Both Monte Carlo and molecular dynamics versions of the algorithms are given. We then present various extensions of these three generalized-ensemble algorithms. The effectiveness of the methods is tested with short peptide and protein systems.

  11. Statistical trajectory of an approximate EM algorithm for probabilistic image processing

    International Nuclear Information System (INIS)

    Tanaka, Kazuyuki; Titterington, D M

    2007-01-01

    We calculate analytically a statistical average of trajectories of an approximate expectation-maximization (EM) algorithm with generalized belief propagation (GBP) and a Gaussian graphical model for the estimation of hyperparameters from observable data in probabilistic image processing. A statistical average with respect to observed data corresponds to a configuration average for the random-field Ising model in spin glass theory. In the present paper, hyperparameters which correspond to interactions and external fields of spin systems are estimated by an approximate EM algorithm. A practical algorithm is described for gray-level image restoration based on a Gaussian graphical model and GBP. The GBP approach corresponds to the cluster variation method in statistical mechanics. Our main result in the present paper is to obtain the statistical average of the trajectory in the approximate EM algorithm by using loopy belief propagation and GBP with respect to degraded images generated from a probability density function with true values of hyperparameters. The statistical average of the trajectory can be expressed in terms of recursion formulas derived from some analytical calculations

  12. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  13. A Monte Carlo Metropolis-Hastings Algorithm for Sampling from Distributions with Intractable Normalizing Constants

    KAUST Repository

    Liang, Faming; Jin, Ick-Hoon

    2013-01-01

    Simulating from distributions with intractable normalizing constants has been a long-standing problem inmachine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. TheMCMHalgorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals. © 2013 Massachusetts Institute of Technology.

  14. A Monte Carlo Metropolis-Hastings Algorithm for Sampling from Distributions with Intractable Normalizing Constants

    KAUST Repository

    Liang, Faming

    2013-08-01

    Simulating from distributions with intractable normalizing constants has been a long-standing problem inmachine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. TheMCMHalgorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals. © 2013 Massachusetts Institute of Technology.

  15. The Wang-Landau Sampling Algorithm

    Science.gov (United States)

    Landau, David P.

    2003-03-01

    Over the past several decades Monte Carlo simulations[1] have evolved into a powerful tool for the study of wide-ranging problems in statistical/condensed matter physics. Standard methods sample the probability distribution for the states of the system, usually in the canonical ensemble, and enormous improvements have been made in performance through the implementation of novel algorithms. Nonetheless, difficulties arise near phase transitions, either due to critical slowing down near 2nd order transitions or to metastability near 1st order transitions, thus limiting the applicability of the method. We shall describe a new and different Monte Carlo approach [2] that uses a random walk in energy space to determine the density of states directly. Once the density of states is estimated, all thermodynamic properties can be calculated at all temperatures. This approach can be extended to multi-dimensional parameter spaces and has already found use in classical models of interacting particles including systems with complex energy landscapes, e.g., spin glasses, protein folding models, etc., as well as for quantum models. 1. A Guide to Monte Carlo Simulations in Statistical Physics, D. P. Landau and K. Binder (Cambridge U. Press, Cambridge, 2000). 2. Fugao Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 (2001); Phys. Rev. E64, 056101-1 (2001).

  16. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup

    2017-11-15

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)

  17. Statistical distribution sampling

    Science.gov (United States)

    Johnson, E. S.

    1975-01-01

    Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.

  18. A fast direct sampling algorithm for equilateral closed polygons

    International Nuclear Information System (INIS)

    Cantarella, Jason; Duplantier, Bertrand; Shonkwiler, Clayton; Uehara, Erica

    2016-01-01

    Sampling equilateral closed polygons is of interest in the statistical study of ring polymers. Over the past 30 years, previous authors have proposed a variety of simple Markov chain algorithms (but have not been able to show that they converge to the correct probability distribution) and complicated direct samplers (which require extended-precision arithmetic to evaluate numerically unstable polynomials). We present a simple direct sampler which is fast and numerically stable, and analyze its runtime using a new formula for the volume of equilateral polygon space as a Dirichlet-type integral. (paper)

  19. Two General Extension Algorithms of Latin Hypercube Sampling

    Directory of Open Access Journals (Sweden)

    Zhi-zhao Liu

    2015-01-01

    Full Text Available For reserving original sampling points to reduce the simulation runs, two general extension algorithms of Latin Hypercube Sampling (LHS are proposed. The extension algorithms start with an original LHS of size m and construct a new LHS of size m+n that contains the original points as many as possible. In order to get a strict LHS of larger size, some original points might be deleted. The relationship of original sampling points in the new LHS structure is shown by a simple undirected acyclic graph. The basic general extension algorithm is proposed to reserve the most original points, but it costs too much time. Therefore, a general extension algorithm based on greedy algorithm is proposed to reduce the extension time, which cannot guarantee to contain the most original points. These algorithms are illustrated by an example and applied to evaluating the sample means to demonstrate the effectiveness.

  20. An Intrinsic Algorithm for Parallel Poisson Disk Sampling on Arbitrary Surfaces.

    Science.gov (United States)

    Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

    2013-03-08

    Poisson disk sampling plays an important role in a variety of visual computing, due to its useful statistical property in distribution and the absence of aliasing artifacts. While many effective techniques have been proposed to generate Poisson disk distribution in Euclidean space, relatively few work has been reported to the surface counterpart. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. We propose a new technique for parallelizing the dart throwing. Rather than the conventional approaches that explicitly partition the spatial domain to generate the samples in parallel, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. It is worth noting that our algorithm is accurate as the generated Poisson disks are uniformly and randomly distributed without bias. Our method is intrinsic in that all the computations are based on the intrinsic metric and are independent of the embedding space. This intrinsic feature allows us to generate Poisson disk distributions on arbitrary surfaces. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.

  1. Efficient sampling algorithms for Monte Carlo based treatment planning

    International Nuclear Information System (INIS)

    DeMarco, J.J.; Solberg, T.D.; Chetty, I.; Smathers, J.B.

    1998-01-01

    Efficient sampling algorithms are necessary for producing a fast Monte Carlo based treatment planning code. This study evaluates several aspects of a photon-based tracking scheme and the effect of optimal sampling algorithms on the efficiency of the code. Four areas were tested: pseudo-random number generation, generalized sampling of a discrete distribution, sampling from the exponential distribution, and delta scattering as applied to photon transport through a heterogeneous simulation geometry. Generalized sampling of a discrete distribution using the cutpoint method can produce speedup gains of one order of magnitude versus conventional sequential sampling. Photon transport modifications based upon the delta scattering method were implemented and compared with a conventional boundary and collision checking algorithm. The delta scattering algorithm is faster by a factor of six versus the conventional algorithm for a boundary size of 5 mm within a heterogeneous geometry. A comparison of portable pseudo-random number algorithms and exponential sampling techniques is also discussed

  2. Algorithm for image retrieval based on edge gradient orientation statistical code.

    Science.gov (United States)

    Zeng, Jiexian; Zhao, Yonggang; Li, Weiye; Fu, Xiang

    2014-01-01

    Image edge gradient direction not only contains important information of the shape, but also has a simple, lower complexity characteristic. Considering that the edge gradient direction histograms and edge direction autocorrelogram do not have the rotation invariance, we put forward the image retrieval algorithm which is based on edge gradient orientation statistical code (hereinafter referred to as EGOSC) by sharing the application of the statistics method in the edge direction of the chain code in eight neighborhoods to the statistics of the edge gradient direction. Firstly, we construct the n-direction vector and make maximal summation restriction on EGOSC to make sure this algorithm is invariable for rotation effectively. Then, we use Euclidean distance of edge gradient direction entropy to measure shape similarity, so that this method is not sensitive to scaling, color, and illumination change. The experimental results and the algorithm analysis demonstrate that the algorithm can be used for content-based image retrieval and has good retrieval results.

  3. A Simplified Algorithm for Statistical Investigation of Damage Spreading

    International Nuclear Information System (INIS)

    Gecow, Andrzej

    2009-01-01

    On the way to simulating adaptive evolution of complex system describing a living object or human developed project, a fitness should be defined on node states or network external outputs. Feedbacks lead to circular attractors of these states or outputs which make it difficult to define a fitness. The main statistical effects of adaptive condition are the result of small change tendency and to appear, they only need a statistically correct size of damage initiated by evolutionary change of system. This observation allows to cut loops of feedbacks and in effect to obtain a particular statistically correct state instead of a long circular attractor which in the quenched model is expected for chaotic network with feedback. Defining fitness on such states is simple. We calculate only damaged nodes and only once. Such an algorithm is optimal for investigation of damage spreading i.e. statistical connections of structural parameters of initial change with the size of effected damage. It is a reversed-annealed method--function and states (signals) may be randomly substituted but connections are important and are preserved. The small damages important for adaptive evolution are correctly depicted in comparison to Derrida annealed approximation which expects equilibrium levels for large networks. The algorithm indicates these levels correctly. The relevant program in Pascal, which executes the algorithm for a wide range of parameters, can be obtained from the author.

  4. Improved Sampling Algorithms in the Risk-Informed Safety Margin Characterization Toolkit

    International Nuclear Information System (INIS)

    Mandelli, Diego; Smith, Curtis Lee; Alfonsi, Andrea; Rabiti, Cristian; Cogliati, Joshua Joseph

    2015-01-01

    The RISMC approach is developing advanced set of methodologies and algorithms in order to perform Probabilistic Risk Analyses (PRAs). In contrast to classical PRA methods, which are based on Event-Tree and Fault-Tree methods, the RISMC approach largely employs system simulator codes applied to stochastic analysis tools. The basic idea is to randomly perturb (by employing sampling algorithms) timing and sequencing of events and internal parameters of the system codes (i.e., uncertain parameters) in order to estimate stochastic parameters such as core damage probability. This approach applied to complex systems such as nuclear power plants requires to perform a series of computationally expensive simulation runs given a large set of uncertain parameters. These types of analysis are affected by two issues. Firstly, the space of the possible solutions (a.k.a., the issue space or the response surface) can be sampled only very sparsely, and this precludes the ability to fully analyze the impact of uncertainties on the system dynamics. Secondly, large amounts of data are generated and tools to generate knowledge from such data sets are not yet available. This report focuses on the first issue and in particular employs novel methods that optimize the information generated by the sampling process by sampling unexplored and risk-significant regions of the issue space: adaptive (smart) sampling algorithms. They infer system response from surrogate models constructed from existing samples and predict the most relevant location of the next sample. It is therefore possible to understand features of the issue space with a small number of carefully selected samples. In this report, we will present how it is possible to perform adaptive sampling using the RISMC toolkit and highlight the advantages compared to more classical sampling approaches such Monte-Carlo. We will employ RAVEN to perform such statistical analyses using both analytical cases but also another RISMC code: RELAP-7.

  5. Improved Sampling Algorithms in the Risk-Informed Safety Margin Characterization Toolkit

    Energy Technology Data Exchange (ETDEWEB)

    Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Smith, Curtis Lee [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States); Cogliati, Joshua Joseph [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RISMC approach is developing advanced set of methodologies and algorithms in order to perform Probabilistic Risk Analyses (PRAs). In contrast to classical PRA methods, which are based on Event-Tree and Fault-Tree methods, the RISMC approach largely employs system simulator codes applied to stochastic analysis tools. The basic idea is to randomly perturb (by employing sampling algorithms) timing and sequencing of events and internal parameters of the system codes (i.e., uncertain parameters) in order to estimate stochastic parameters such as core damage probability. This approach applied to complex systems such as nuclear power plants requires to perform a series of computationally expensive simulation runs given a large set of uncertain parameters. These types of analysis are affected by two issues. Firstly, the space of the possible solutions (a.k.a., the issue space or the response surface) can be sampled only very sparsely, and this precludes the ability to fully analyze the impact of uncertainties on the system dynamics. Secondly, large amounts of data are generated and tools to generate knowledge from such data sets are not yet available. This report focuses on the first issue and in particular employs novel methods that optimize the information generated by the sampling process by sampling unexplored and risk-significant regions of the issue space: adaptive (smart) sampling algorithms. They infer system response from surrogate models constructed from existing samples and predict the most relevant location of the next sample. It is therefore possible to understand features of the issue space with a small number of carefully selected samples. In this report, we will present how it is possible to perform adaptive sampling using the RISMC toolkit and highlight the advantages compared to more classical sampling approaches such Monte-Carlo. We will employ RAVEN to perform such statistical analyses using both analytical cases but also another RISMC code: RELAP-7.

  6. Characteristic statistic algorithm (CSA) for in-core loading pattern optimization

    International Nuclear Information System (INIS)

    Liu Zhihong; Hu Yongming; Shi Gong

    2007-01-01

    To solve the problem of PWR in-core loading pattern optimization, a more suitable global optimization algorithm, i.e., Characteristic statistic algorithm (CSA), is used. The searching process of this algorithm and how to apply it to this problem are presented. Loading pattern optimization code SCYCLE is developed. Two different problems on real PWR models are calculated and the results are compared with other algorithms. It is shown that SCYCLE has high efficiency and good global performance on this problem. (authors)

  7. Economic Statistical Design of Variable Sampling Interval X¯$\\overline X $ Control Chart Based on Surrogate Variable Using Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lee Tae-Hoon

    2016-12-01

    Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.

  8. A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates

    DEFF Research Database (Denmark)

    Hobolth, Asger

    2008-01-01

    -dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed......The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor......-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high...

  9. A rank-based algorithm of differential expression analysis for small cell line data with statistical control.

    Science.gov (United States)

    Li, Xiangyu; Cai, Hao; Wang, Xianlong; Ao, Lu; Guo, You; He, Jun; Gu, Yunyan; Qi, Lishuang; Guan, Qingzhou; Lin, Xu; Guo, Zheng

    2017-10-13

    To detect differentially expressed genes (DEGs) in small-scale cell line experiments, usually with only two or three technical replicates for each state, the commonly used statistical methods such as significance analysis of microarrays (SAM), limma and RankProd (RP) lack statistical power, while the fold change method lacks any statistical control. In this study, we demonstrated that the within-sample relative expression orderings (REOs) of gene pairs were highly stable among technical replicates of a cell line but often widely disrupted after certain treatments such like gene knockdown, gene transfection and drug treatment. Based on this finding, we customized the RankComp algorithm, previously designed for individualized differential expression analysis through REO comparison, to identify DEGs with certain statistical control for small-scale cell line data. In both simulated and real data, the new algorithm, named CellComp, exhibited high precision with much higher sensitivity than the original RankComp, SAM, limma and RP methods. Therefore, CellComp provides an efficient tool for analyzing small-scale cell line data. © The Author 2017. Published by Oxford University Press.

  10. A Formal Approach for RT-DVS Algorithms Evaluation Based on Statistical Model Checking

    Directory of Open Access Journals (Sweden)

    Shengxin Dai

    2015-01-01

    Full Text Available Energy saving is a crucial concern in embedded real time systems. Many RT-DVS algorithms have been proposed to save energy while preserving deadline guarantees. This paper presents a novel approach to evaluate RT-DVS algorithms using statistical model checking. A scalable framework is proposed for RT-DVS algorithms evaluation, in which the relevant components are modeled as stochastic timed automata, and the evaluation metrics including utilization bound, energy efficiency, battery awareness, and temperature awareness are expressed as statistical queries. Evaluation of these metrics is performed by verifying the corresponding queries using UPPAAL-SMC and analyzing the statistical information provided by the tool. We demonstrate the applicability of our framework via a case study of five classical RT-DVS algorithms.

  11. Statistical algorithm for automated signature analysis of power spectral density data

    International Nuclear Information System (INIS)

    Piety, K.R.

    1977-01-01

    A statistical algorithm has been developed and implemented on a minicomputer system for on-line, surveillance applications. Power spectral density (PSD) measurements on process signals are the performance signatures that characterize the ''health'' of the monitored equipment. Statistical methods provide a quantitative basis for automating the detection of anomalous conditions. The surveillance algorithm has been tested on signals from neutron sensors, proximeter probes, and accelerometers to determine its potential for monitoring nuclear reactors and rotating machinery

  12. Statistical Symbolic Execution with Informed Sampling

    Science.gov (United States)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  13. Statistical classification techniques in high energy physics (SDDT algorithm)

    International Nuclear Information System (INIS)

    Bouř, Petr; Kůs, Václav; Franc, Jiří

    2016-01-01

    We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp̅ collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb"-"1 recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests. (paper)

  14. Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

    KAUST Repository

    Elsheikh, A. H.

    2013-12-01

    Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known as nested sampling (NS), which can simultaneously sample the posterior distribution for uncertainty quantification, and estimate the Bayesian evidence for model selection. Model selection statistics, such as the Bayesian evidence, are needed to choose or assign different weights to different models of different levels of complexities. In this work, we report the first successful application of nested sampling for calibration of several nonlinear subsurface flow problems. The estimated Bayesian evidence by the NS algorithm is used to weight different parameterizations of the subsurface flow models (prior model selection). The results of the numerical evaluation implicitly enforced Occam\\'s razor where simpler models with fewer number of parameters are favored over complex models. The proper level of model complexity was automatically determined based on the information content of the calibration data and the data mismatch of the calibrated model.

  15. 42 CFR 402.109 - Statistical sampling.

    Science.gov (United States)

    2010-10-01

    ... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...

  16. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  17. The product composition control system at Savannah River: Statistical process control algorithm

    International Nuclear Information System (INIS)

    Brown, K.G.

    1994-01-01

    The Defense Waste Processing Facility (DWPF) at the Savannah River Site (SRS) will be used to immobilize the approximately 130 million liters of high-level nuclear waste currently stored at the site in 51 carbon steel tanks. Waste handling operations separate this waste into highly radioactive insoluble sludge and precipitate and less radioactive water soluble salts. In DWPF, precipitate (PHA) is blended with insoluble sludge and ground glass frit to produce melter feed slurry which is continuously fed to the DWPF melter. The melter produces a molten borosilicate glass which is poured into stainless steel canisters for cooling and, ultimately, shipment to and storage in an geologic repository. Described here is the Product Composition Control System (PCCS) process control algorithm. The PCCS is the amalgam of computer hardware and software intended to ensure that the melt will be processable and that the glass wasteform produced will be acceptable. Within PCCS, the Statistical Process Control (SPC) Algorithm is the means which guides control of the DWPF process. The SPC Algorithm is necessary to control the multivariate DWPF process in the face of uncertainties arising from the process, its feeds, sampling, modeling, and measurement systems. This article describes the functions performed by the SPC Algorithm, characterization of DWPF prior to making product, accounting for prediction uncertainty, accounting for measurement uncertainty, monitoring a SME batch, incorporating process information, and advantages of the algorithm. 9 refs., 6 figs

  18. Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

    OpenAIRE

    Gray, Alexander G.; Fischer, Bernd; Schumann, Johann; Buntine, Wray

    2003-01-01

    Machine learning has reached a point where many probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer alge...

  19. Quantitative Imaging Biomarkers: A Review of Statistical Methods for Computer Algorithm Comparisons

    Science.gov (United States)

    2014-01-01

    Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research. PMID:24919829

  20. Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons.

    Science.gov (United States)

    Obuchowski, Nancy A; Reeves, Anthony P; Huang, Erich P; Wang, Xiao-Feng; Buckler, Andrew J; Kim, Hyun J Grace; Barnhart, Huiman X; Jackson, Edward F; Giger, Maryellen L; Pennello, Gene; Toledano, Alicia Y; Kalpathy-Cramer, Jayashree; Apanasovich, Tatiyana V; Kinahan, Paul E; Myers, Kyle J; Goldgof, Dmitry B; Barboriak, Daniel P; Gillies, Robert J; Schwartz, Lawrence H; Sullivan, Daniel C

    2015-02-01

    Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  1. Sampling microcanonical measures of the 2D Euler equations through Creutz’s algorithm: a phase transition from disorder to order when energy is increased

    International Nuclear Information System (INIS)

    Potters, Max; Vaillant, Timothee; Bouchet, Freddy

    2013-01-01

    The 2D Euler equations are basic examples of fluid models for which a microcanonical measure can be constructed from first principles. This measure is defined through finite-dimensional approximations and a limiting procedure. Creutz’s algorithm is a microcanonical generalization of the Metropolis–Hastings algorithm (to sample Gibbs measures, in the canonical ensemble). We prove that Creutz’s algorithm can sample finite-dimensional approximations of the 2D Euler microcanonical measures (incorporating fixed energy and other invariants). This is essential as microcanonical and canonical measures are known to be inequivalent at some values of energy and vorticity distribution. Creutz’s algorithm is used to check predictions from the mean-field statistical mechanics theory of the 2D Euler equations (the Robert–Sommeria–Miller theory). We find full agreement with theory. Three different ways to compute the temperature give consistent results. Using Creutz’s algorithm, a first-order phase transition never observed previously and a situation of statistical ensemble inequivalence are found and studied. Strikingly, and in contrast to the usual statistical mechanics interpretations, this phase transition appears from a disordered phase to an ordered phase (with fewer symmetries) when the energy is increased. We explain this paradox. (paper)

  2. Phase Transitions in Combinatorial Optimization Problems: Basics, Algorithms and Statistical Mechanics

    Science.gov (United States)

    Hartmann, Alexander K.; Weigt, Martin

    2005-10-01

    A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary basics in required detail. Throughout, the algorithms are shown with examples and calculations, while the proofs are given in a way suitable for graduate students, post-docs, and researchers. Ideal for newcomers to this young, multidisciplinary field.

  3. Iterative algorithm of discrete Fourier transform for processing randomly sampled NMR data sets

    International Nuclear Information System (INIS)

    Stanek, Jan; Kozminski, Wiktor

    2010-01-01

    Spectra obtained by application of multidimensional Fourier Transformation (MFT) to sparsely sampled nD NMR signals are usually corrupted due to missing data. In the present paper this phenomenon is investigated on simulations and experiments. An effective iterative algorithm for artifact suppression for sparse on-grid NMR data sets is discussed in detail. It includes automated peak recognition based on statistical methods. The results enable one to study NMR spectra of high dynamic range of peak intensities preserving benefits of random sampling, namely the superior resolution in indirectly measured dimensions. Experimental examples include 3D 15 N- and 13 C-edited NOESY-HSQC spectra of human ubiquitin.

  4. Contributions to sampling statistics

    CERN Document Server

    Conti, Pier; Ranalli, Maria

    2014-01-01

    This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international  forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...

  5. Statistical Processing Algorithms for Human Population Databases

    Directory of Open Access Journals (Sweden)

    Camelia COLESCU

    2012-01-01

    Full Text Available The article is describing some algoritms for statistic functions aplied to a human population database. The samples are specific for the most interesting periods, when the evolution of statistical datas has spectacolous value. The article describes the most usefull form of grafical prezentation of the results

  6. Statistical literacy and sample survey results

    Science.gov (United States)

    McAlevey, Lynn; Sullivan, Charles

    2010-10-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.

  7. Algorithm for the generation of nuclear spin species and nuclear spin statistical weights

    International Nuclear Information System (INIS)

    Balasubramanian, K.

    1982-01-01

    A set of algorithms for the computer generation of nuclear spin species and nuclear spin statistical weights potentially useful in molecular spectroscopy is developed. These algorithms generate the nuclear spin species from group structures known as generalized character cycle indices (GCCIs). Thus the required input for these algorithms is just the set of all GCCIs for the symmetry group of the molecule which can be computed easily from the character table. The algorithms are executed and illustrated with examples

  8. A software sampling frequency adaptive algorithm for reducing spectral leakage

    Institute of Scientific and Technical Information of China (English)

    PAN Li-dong; WANG Fei

    2006-01-01

    Spectral leakage caused by synchronous error in a nonsynchronous sampling system is an important cause that reduces the accuracy of spectral analysis and harmonic measurement.This paper presents a software sampling frequency adaptive algorithm that can obtain the actual signal frequency more accurately,and then adjusts sampling interval base on the frequency calculated by software algorithm and modifies sampling frequency adaptively.It can reduce synchronous error and impact of spectral leakage;thereby improving the accuracy of spectral analysis and harmonic measurement for power system signal where frequency changes slowly.This algorithm has high precision just like the simulations show,and it can be a practical method in power system harmonic analysis since it can be implemented easily.

  9. Statistical sampling approaches for soil monitoring

    NARCIS (Netherlands)

    Brus, D.J.

    2014-01-01

    This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid

  10. Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.

    Science.gov (United States)

    Kamiura, Moto; Sano, Kohei

    2017-10-01

    The principle of optimism in the face of uncertainty is known as a heuristic in sequential decision-making problems. Overtaking method based on this principle is an effective algorithm to solve multi-armed bandit problems. It was defined by a set of some heuristic patterns of the formulation in the previous study. The objective of the present paper is to redefine the value functions of Overtaking method and to unify the formulation of them. The unified Overtaking method is associated with upper bounds of confidence intervals of expected rewards on statistics. The unification of the formulation enhances the universality of Overtaking method. Consequently we newly obtain Overtaking method for the exponentially distributed rewards, numerically analyze it, and show that it outperforms UCB algorithm on average. The present study suggests that the principle of optimism in the face of uncertainty should be regarded as the statistics-based consequence of the law of large numbers for the sample mean of rewards and estimation of upper bounds of expected rewards, rather than as a heuristic, in the context of multi-armed bandit problems. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Algorithm for computing significance levels using the Kolmogorov-Smirnov statistic and valid for both large and small samples

    Energy Technology Data Exchange (ETDEWEB)

    Kurtz, S.E.; Fields, D.E.

    1983-10-01

    The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.

  12. Compressively sampled MR image reconstruction using generalized thresholding iterative algorithm

    Science.gov (United States)

    Elahi, Sana; kaleem, Muhammad; Omer, Hammad

    2018-01-01

    Compressed sensing (CS) is an emerging area of interest in Magnetic Resonance Imaging (MRI). CS is used for the reconstruction of the images from a very limited number of samples in k-space. This significantly reduces the MRI data acquisition time. One important requirement for signal recovery in CS is the use of an appropriate non-linear reconstruction algorithm. It is a challenging task to choose a reconstruction algorithm that would accurately reconstruct the MR images from the under-sampled k-space data. Various algorithms have been used to solve the system of non-linear equations for better image quality and reconstruction speed in CS. In the recent past, iterative soft thresholding algorithm (ISTA) has been introduced in CS-MRI. This algorithm directly cancels the incoherent artifacts produced because of the undersampling in k -space. This paper introduces an improved iterative algorithm based on p -thresholding technique for CS-MRI image reconstruction. The use of p -thresholding function promotes sparsity in the image which is a key factor for CS based image reconstruction. The p -thresholding based iterative algorithm is a modification of ISTA, and minimizes non-convex functions. It has been shown that the proposed p -thresholding iterative algorithm can be used effectively to recover fully sampled image from the under-sampled data in MRI. The performance of the proposed method is verified using simulated and actual MRI data taken at St. Mary's Hospital, London. The quality of the reconstructed images is measured in terms of peak signal-to-noise ratio (PSNR), artifact power (AP), and structural similarity index measure (SSIM). The proposed approach shows improved performance when compared to other iterative algorithms based on log thresholding, soft thresholding and hard thresholding techniques at different reduction factors.

  13. Real-time recursive hyperspectral sample and band processing algorithm architecture and implementation

    CERN Document Server

    Chang, Chein-I

    2017-01-01

    This book explores recursive architectures in designing progressive hyperspectral imaging algorithms. In particular, it makes progressive imaging algorithms recursive by introducing the concept of Kalman filtering in algorithm design so that hyperspectral imagery can be processed not only progressively sample by sample or band by band but also recursively via recursive equations. This book can be considered a companion book of author’s books, Real-Time Progressive Hyperspectral Image Processing, published by Springer in 2016. Explores recursive structures in algorithm architecture Implements algorithmic recursive architecture in conjunction with progressive sample and band processing Derives Recursive Hyperspectral Sample Processing (RHSP) techniques according to Band-Interleaved Sample/Pixel (BIS/BIP) acquisition format Develops Recursive Hyperspectral Band Processing (RHBP) techniques according to Band SeQuential (BSQ) acquisition format for hyperspectral data.

  14. Computationally efficient algorithms for statistical image processing : implementation in R

    NARCIS (Netherlands)

    Langovoy, M.; Wittich, O.

    2010-01-01

    In the series of our earlier papers on the subject, we proposed a novel statistical hypothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We developed algorithms that allowed to detect objects of unknown shapes in

  15. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    Science.gov (United States)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  16. Statistical aspects of food safety sampling

    NARCIS (Netherlands)

    Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.

    2015-01-01

    In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of

  17. Understand your Algorithm: Drill Down to Sample Visualizations in Jupyter Notebooks

    Science.gov (United States)

    Mapes, B. E.; Ho, Y.; Cheedela, S. K.; McWhirter, J.

    2017-12-01

    Statistics are the currency of climate dynamics, but the space of all possible algorithms is fathomless - especially for 4-dimensional weather-resolving data that many "impact" variables depend on. Algorithms are designed on data samples, but how do you know if they measure what you expect when turned loose on Big Data? We will introduce the year-1 prototype of a 3-year scientist-led, NSF-supported, Unidata-quality software stack called DRILSDOWN (https://brianmapes.github.io/EarthCube-DRILSDOWN/) for automatically extracting, integrating, and visualizing multivariate 4D data samples. Based on a customizable "IDV bundle" of data sources, fields and displays supplied by the user, the system will teleport its space-time coordinates to fetch Cases of Interest (edge cases, typical cases, etc.) from large aggregated repositories. These standard displays can serve as backdrops to overlay with your value-added fields (such as derived quantities stored on a user's local disk). Fields can be readily pulled out of the visualization object for further processing in Python. The hope is that algorithms successfully tested in this visualization space will then be lifted out and added to automatic processing toolchains, lending confidence in the next round of processing, to seek the next Cases of Interest, in light of a user's statistical measures of "Interest". To log the scientific work done in this vein, the visualizations are wrapped in iPython-based Jupyter notebooks for rich, human-readable documentation (indeed, quasi-publication with formatted text, LaTex math, etc.). Such notebooks are readable and executable, with digital replicability and provenance built in. The entire digital object of a case study can be stored in a repository, where libraries of these Case Study Notebooks can be examined in a browser. Model data (the session topic) are of course especially convenient for this system, but observations of all sorts can also be brought in, overlain, and differenced or

  18. Audit sampling: A qualitative study on the role of statistical and non-statistical sampling approaches on audit practices in Sweden

    OpenAIRE

    Ayam, Rufus Tekoh

    2011-01-01

    PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...

  19. An Improved Nested Sampling Algorithm for Model Selection and Assessment

    Science.gov (United States)

    Zeng, X.; Ye, M.; Wu, J.; WANG, D.

    2017-12-01

    Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.

  20. Multiscale Monte Carlo algorithms in statistical mechanics and quantum field theory

    Energy Technology Data Exchange (ETDEWEB)

    Lauwers, P G

    1990-12-01

    Conventional Monte Carlo simulation algorithms for models in statistical mechanics and quantum field theory are afflicted by problems caused by their locality. They become highly inefficient if investigations of critical or nearly-critical systems, i.e., systems with important large scale phenomena, are undertaken. We present two types of multiscale approaches that alleveate problems of this kind: Stochastic cluster algorithms and multigrid Monte Carlo simulation algorithms. Another formidable computational problem in simulations of phenomenologically relevant field theories with fermions is the need for frequently inverting the Dirac operator. This inversion can be accelerated considerably by means of deterministic multigrid methods, very similar to the ones used for the numerical solution of differential equations. (orig.).

  1. Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.

    Energy Technology Data Exchange (ETDEWEB)

    Matulef, Kevin Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-02-01

    The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewer resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.

  2. Measuring radioactive half-lives via statistical sampling in practice

    Science.gov (United States)

    Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.

    2017-10-01

    The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.

  3. The variance quadtree algorithm: use for spatial sampling design

    NARCIS (Netherlands)

    Minasny, B.; McBratney, A.B.; Walvoort, D.J.J.

    2007-01-01

    Spatial sampling schemes are mainly developed to determine sampling locations that can cover the variation of environmental properties in the area of interest. Here we proposed the variance quadtree algorithm for sampling in an area with prior information represented as ancillary or secondary

  4. Application of image recognition algorithms for statistical description of nano- and microstructured surfaces

    Energy Technology Data Exchange (ETDEWEB)

    Mărăscu, V.; Dinescu, G. [National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, Bucharest– Magurele (Romania); Faculty of Physics, University of Bucharest, 405 Atomistilor Street, Bucharest-Magurele (Romania); Chiţescu, I. [Faculty of Mathematics and Computer Science, University of Bucharest, 14 Academiei Street, Bucharest (Romania); Barna, V. [Faculty of Physics, University of Bucharest, 405 Atomistilor Street, Bucharest-Magurele (Romania); Ioniţă, M. D.; Lazea-Stoyanova, A.; Mitu, B., E-mail: mitub@infim.ro [National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, Bucharest– Magurele (Romania)

    2016-03-25

    In this paper we propose a statistical approach for describing the self-assembling of sub-micronic polystyrene beads on silicon surfaces, as well as the evolution of surface topography due to plasma treatments. Algorithms for image recognition are used in conjunction with Scanning Electron Microscopy (SEM) imaging of surfaces. In a first step, greyscale images of the surface covered by the polystyrene beads are obtained. Further, an adaptive thresholding method was applied for obtaining binary images. The next step consisted in automatic identification of polystyrene beads dimensions, by using Hough transform algorithm, according to beads radius. In order to analyze the uniformity of the self–assembled polystyrene beads, the squared modulus of 2-dimensional Fast Fourier Transform (2- D FFT) was applied. By combining these algorithms we obtain a powerful and fast statistical tool for analysis of micro and nanomaterials with aspect features regularly distributed on surface upon SEM examination.

  5. Application of image recognition algorithms for statistical description of nano- and microstructured surfaces

    International Nuclear Information System (INIS)

    Mărăscu, V.; Dinescu, G.; Chiţescu, I.; Barna, V.; Ioniţă, M. D.; Lazea-Stoyanova, A.; Mitu, B.

    2016-01-01

    In this paper we propose a statistical approach for describing the self-assembling of sub-micronic polystyrene beads on silicon surfaces, as well as the evolution of surface topography due to plasma treatments. Algorithms for image recognition are used in conjunction with Scanning Electron Microscopy (SEM) imaging of surfaces. In a first step, greyscale images of the surface covered by the polystyrene beads are obtained. Further, an adaptive thresholding method was applied for obtaining binary images. The next step consisted in automatic identification of polystyrene beads dimensions, by using Hough transform algorithm, according to beads radius. In order to analyze the uniformity of the self–assembled polystyrene beads, the squared modulus of 2-dimensional Fast Fourier Transform (2- D FFT) was applied. By combining these algorithms we obtain a powerful and fast statistical tool for analysis of micro and nanomaterials with aspect features regularly distributed on surface upon SEM examination.

  6. Stochastic geometry, spatial statistics and random fields models and algorithms

    CERN Document Server

    2015-01-01

    Providing a graduate level introduction to various aspects of stochastic geometry, spatial statistics and random fields, this volume places a special emphasis on fundamental classes of models and algorithms as well as on their applications, for example in materials science, biology and genetics. This book has a strong focus on simulations and includes extensive codes in Matlab and R, which are widely used in the mathematical community. It can be regarded as a continuation of the recent volume 2068 of Lecture Notes in Mathematics, where other issues of stochastic geometry, spatial statistics and random fields were considered, with a focus on asymptotic methods.

  7. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Jianning Wu

    2015-01-01

    Full Text Available The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.

  8. The novel quantitative technique for assessment of gait symmetry using advanced statistical learning algorithm.

    Science.gov (United States)

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.

  9. Predicting Smoking Status Using Machine Learning Algorithms and Statistical Analysis

    Directory of Open Access Journals (Sweden)

    Charles Frank

    2018-03-01

    Full Text Available Smoking has been proven to negatively affect health in a multitude of ways. As of 2009, smoking has been considered the leading cause of preventable morbidity and mortality in the United States, continuing to plague the country’s overall health. This study aims to investigate the viability and effectiveness of some machine learning algorithms for predicting the smoking status of patients based on their blood tests and vital readings results. The analysis of this study is divided into two parts: In part 1, we use One-way ANOVA analysis with SAS tool to show the statistically significant difference in blood test readings between smokers and non-smokers. The results show that the difference in INR, which measures the effectiveness of anticoagulants, was significant in favor of non-smokers which further confirms the health risks associated with smoking. In part 2, we use five machine learning algorithms: Naïve Bayes, MLP, Logistic regression classifier, J48 and Decision Table to predict the smoking status of patients. To compare the effectiveness of these algorithms we use: Precision, Recall, F-measure and Accuracy measures. The results show that the Logistic algorithm outperformed the four other algorithms with Precision, Recall, F-Measure, and Accuracy of 83%, 83.4%, 83.2%, 83.44%, respectively.

  10. Phase Transitions in Combinatorial Optimization Problems Basics, Algorithms and Statistical Mechanics

    CERN Document Server

    Hartmann, Alexander K

    2005-01-01

    A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary

  11. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian

    2016-01-06

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.

  12. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using

  13. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    International Nuclear Information System (INIS)

    Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems

  14. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    Energy Technology Data Exchange (ETDEWEB)

    Elsheikh, Ahmed H., E-mail: aelsheikh@ices.utexas.edu [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Wheeler, Mary F. [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Hoteit, Ibrahim [Department of Earth Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal (Saudi Arabia)

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.

  15. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.

  16. Statistical benchmark for BosonSampling

    International Nuclear Information System (INIS)

    Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher

    2016-01-01

    Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)

  17. Scalable Algorithms for Adaptive Statistical Designs

    Directory of Open Access Journals (Sweden)

    Robert Oehmke

    2000-01-01

    Full Text Available We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as generalized string matching.

  18. Adaptive sampling rate control for networked systems based on statistical characteristics of packet disordering.

    Science.gov (United States)

    Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng

    2015-09-01

    This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  19. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Scheid Anika

    2012-07-01

    Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst

  20. An intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces.

    Science.gov (United States)

    Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

    2013-09-01

    Poisson disk sampling has excellent spatial and spectral properties, and plays an important role in a variety of visual computing. Although many promising algorithms have been proposed for multidimensional sampling in euclidean space, very few studies have been reported with regard to the problem of generating Poisson disks on surfaces due to the complicated nature of the surface. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. In sharp contrast to the conventional parallel approaches, our method neither partitions the given surface into small patches nor uses any spatial data structure to maintain the voids in the sampling domain. Instead, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. Our algorithm guarantees that the generated Poisson disks are uniformly and randomly distributed without bias. It is worth noting that our method is intrinsic and independent of the embedding space. This intrinsic feature allows us to generate Poisson disk patterns on arbitrary surfaces in IR(n). To our knowledge, this is the first intrinsic, parallel, and accurate algorithm for surface Poisson disk sampling. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.

  1. Statistical conditional sampling for variable-resolution video compression.

    Directory of Open Access Journals (Sweden)

    Alexander Wong

    Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.

  2. Statistical sampling techniques as applied to OSE inspections

    International Nuclear Information System (INIS)

    Davis, J.J.; Cote, R.W.

    1987-01-01

    The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing

  3. Statistical image reconstruction for transmission tomography using relaxed ordered subset algorithms

    International Nuclear Information System (INIS)

    Kole, J S

    2005-01-01

    Statistical reconstruction methods offer possibilities for improving image quality as compared to analytical methods, but current reconstruction times prohibit routine clinical applications in x-ray computed tomography (CT). To reduce reconstruction times, we have applied (under) relaxation to ordered subset algorithms. This enables us to use subsets consisting of only single projection angle, effectively increasing the number of image updates within an entire iteration. A second advantage of applying relaxation is that it can help improve convergence by removing the limit cycle behaviour of ordered subset algorithms, which normally do not converge to an optimal solution but rather a suboptimal limit cycle consisting of as many points as there are subsets. Relaxation suppresses the limit cycle behaviour by decreasing the stepsize for approaching the solution. A simulation study for a 2D mathematical phantom and three different ordered subset algorithms shows that all three algorithms benefit from relaxation: equal noise-to-resolution trade-off can be achieved using fewer iterations than the conventional algorithms, while a lower minimal normalized mean square error (NMSE) clearly indicates a better convergence. Two different schemes for setting the relaxation parameter are studied, and both schemes yield approximately the same minimal NMSE

  4. SeaWiFS technical report series. Volume 4: An analysis of GAC sampling algorithms. A case study

    Science.gov (United States)

    Yeh, Eueng-Nan (Editor); Hooker, Stanford B. (Editor); Hooker, Stanford B. (Editor); Mccain, Charles R. (Editor); Fu, Gary (Editor)

    1992-01-01

    The Sea-viewing Wide Field-of-view Sensor (SeaWiFS) instrument will sample at approximately a 1 km resolution at nadir which will be broadcast for reception by realtime ground stations. However, the global data set will be comprised of coarser four kilometer data which will be recorded and broadcast to the SeaWiFS Project for processing. Several algorithms for degrading the one kilometer data to four kilometer data are examined using imagery from the Coastal Zone Color Scanner (CZCS) in an effort to determine which algorithm would best preserve the statistical characteristics of the derived products generated from the one kilometer data. Of the algorithms tested, subsampling based on a fixed pixel within a 4 x 4 pixel array is judged to yield the most consistent results when compared to the one kilometer data products.

  5. Statistical Analysis Of Tank 19F Floor Sample Results

    International Nuclear Information System (INIS)

    Harris, S.

    2010-01-01

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  6. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    Science.gov (United States)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a

  7. Statistical behaviour of adaptive multilevel splitting algorithms in simple models

    International Nuclear Information System (INIS)

    Rolland, Joran; Simonnet, Eric

    2015-01-01

    Adaptive multilevel splitting algorithms have been introduced rather recently for estimating tail distributions in a fast and efficient way. In particular, they can be used for computing the so-called reactive trajectories corresponding to direct transitions from one metastable state to another. The algorithm is based on successive selection–mutation steps performed on the system in a controlled way. It has two intrinsic parameters, the number of particles/trajectories and the reaction coordinate used for discriminating good or bad trajectories. We investigate first the convergence in law of the algorithm as a function of the timestep for several simple stochastic models. Second, we consider the average duration of reactive trajectories for which no theoretical predictions exist. The most important aspect of this work concerns some systems with two degrees of freedom. They are studied in detail as a function of the reaction coordinate in the asymptotic regime where the number of trajectories goes to infinity. We show that during phase transitions, the statistics of the algorithm deviate significatively from known theoretical results when using non-optimal reaction coordinates. In this case, the variance of the algorithm is peaking at the transition and the convergence of the algorithm can be much slower than the usual expected central limit behaviour. The duration of trajectories is affected as well. Moreover, reactive trajectories do not correspond to the most probable ones. Such behaviour disappears when using the optimal reaction coordinate called committor as predicted by the theory. We finally investigate a three-state Markov chain which reproduces this phenomenon and show logarithmic convergence of the trajectory durations

  8. Fast sampling algorithm for the simulation of photon Compton scattering

    International Nuclear Information System (INIS)

    Brusa, D.; Salvat, F.

    1996-01-01

    A simple algorithm for the simulation of Compton interactions of unpolarized photons is described. The energy and direction of the scattered photon, as well as the active atomic electron shell, are sampled from the double-differential cross section obtained by Ribberfors from the relativistic impulse approximation. The algorithm consistently accounts for Doppler broadening and electron binding effects. Simplifications of Ribberfors' formula, required for efficient random sampling, are discussed. The algorithm involves a combination of inverse transform, composition and rejection methods. A parameterization of the Compton profile is proposed from which the simulation of Compton events can be performed analytically in terms of a few parameters that characterize the target atom, namely shell ionization energies, occupation numbers and maximum values of the one-electron Compton profiles. (orig.)

  9. Direct Learning of Systematics-Aware Summary Statistics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...

  10. Development of algorithms for building inventory compilation through remote sensing and statistical inferencing

    Science.gov (United States)

    Sarabandi, Pooya

    economical way. A terrain-dependent-search algorithm is formulated to facilitate the search for correspondences in a quasi-stereo pair of images. The calculated heights for sample buildings using cross-sensor data fusion algorithm show an average coefficient of variation 1.03%. In order to infer structural-type and occupancy-type, i.e. engineering attributes, of buildings from spatial and geometric attributes of 3-D models, a statistical data analysis framework is formulated. Applications of "Classification Trees" and "Multinomial Logistic Models" in modeling the marginal probabilities of class-membership of engineering attributes are investigated. Adaptive statistical models to incorporate different spatial and geometric attributes of buildings---while inferring the engineering attributes---are developed in this dissertation. The inferred engineering attributes in conjunction with the spatial and geometric attributes derived from the imagery can be used to augment regional building inventories and therefore enhance the result of catastrophe models. In the last part of the dissertation, a set of empirically-derived motion-damage relationships based on the correlation of observed building performance with measured ground-motion parameters from 1994 Northridge and 1999 Chi-Chi Taiwan earthquakes are developed. Fragility functions in the form of cumulative lognormal distributions and damage probability matrices for several classes of buildings (wood, steel and concrete), as well as number of ground-motion intensity measures are developed and compared to currently-used motion-damage relationships.

  11. Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

    Science.gov (United States)

    Raymond L. Czaplewski

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

  12. A Separation Algorithm for Sources with Temporal Structure Only Using Second-order Statistics

    Directory of Open Access Journals (Sweden)

    J.G. Wang

    2013-09-01

    Full Text Available Unlike conventional blind source separation (BSS deals with independent identically distributed (i.i.d. sources, this paper addresses the separation from mixtures of sources with temporal structure, such as linear autocorrelations. Many sequential extraction algorithms have been reported, resulting in inevitable cumulated errors introduced by the deflation scheme. We propose a robust separation algorithm to recover original sources simultaneously, through a joint diagonalizer of several average delayed covariance matrices at positions of the optimal time delay and its integers. The proposed algorithm is computationally simple and efficient, since it is based on the second-order statistics only. Extensive simulation results confirm the validity and high performance of the algorithm. Compared with related extraction algorithms, its separation signal-to-noise rate for a desired source can reach 20dB higher, and it seems rather insensitive to the estimation error of the time delay.

  13. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  14. Implementation and statistical analysis of Metropolis algorithm for SU(3)

    International Nuclear Information System (INIS)

    Katznelson, E.; Nobile, A.

    1984-12-01

    In this paper we study the statistical properties of an implementation of the Metropolis algorithm for SU(3) gauge theory. It is shown that the results have normal distribution. We demonstrate that in this case error analysis can be carried on in a simple way and we show that applying it to both the measurement strategy and the output data analysis has an important influence on the performance and reliability of the simulation. (author)

  15. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    Science.gov (United States)

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  16. Statistical Assessment of Gene Fusion Detection Algorithms using RNASequencing Data

    NARCIS (Netherlands)

    Varadan, V.; Janevski, A.; Kamalakaran, S.; Banerjee, N.; Harris, L.; Dimitrova, D.

    2012-01-01

    The detection and quantification of fusion transcripts has both biological and clinical implications. RNA sequencing technology provides a means for unbiased and high resolution characterization of fusion transcript information in tissue samples. We evaluated two fusiondetection algorithms,

  17. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem

  18. A course in mathematical statistics and large sample theory

    CERN Document Server

    Bhattacharya, Rabi; Patrangenaru, Victor

    2016-01-01

    This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...

  19. Illustrating Sampling Distribution of a Statistic: Minitab Revisited

    Science.gov (United States)

    Johnson, H. Dean; Evans, Marc A.

    2008-01-01

    Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…

  20. Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

    KAUST Repository

    Fidel, Adam; Jacobs, Sam Ade; Sharma, Shishir; Amato, Nancy M.; Rauchwerger, Lawrence

    2014-01-01

    Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.

  1. Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

    KAUST Repository

    Fidel, Adam

    2014-05-01

    Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.

  2. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    Energy Technology Data Exchange (ETDEWEB)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.

    2013-04-27

    This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that a decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account

  3. Empirical and Statistical Evaluation of the Effectiveness of Four Lossless Data Compression Algorithms

    Directory of Open Access Journals (Sweden)

    N. A. Azeez

    2017-04-01

    Full Text Available Data compression is the process of reducing the size of a file to effectively reduce storage space and communication cost. The evolvement in technology and digital age has led to an unparalleled usage of digital files in this current decade. The usage of data has resulted to an increase in the amount of data being transmitted via various channels of data communication which has prompted the need to look into the current lossless data compression algorithms to check for their level of effectiveness so as to maximally reduce the bandwidth requirement in communication and transfer of data. Four lossless data compression algorithm: Lempel-Ziv Welch algorithm, Shannon-Fano algorithm, Adaptive Huffman algorithm and Run-Length encoding have been selected for implementation. The choice of these algorithms was based on their similarities, particularly in application areas. Their level of efficiency and effectiveness were evaluated using some set of predefined performance evaluation metrics namely compression ratio, compression factor, compression time, saving percentage, entropy and code efficiency. The algorithms implementation was done in the NetBeans Integrated Development Environment using Java as the programming language. Through the statistical analysis performed using Boxplot and ANOVA and comparison made on the four algo

  4. Use of a Radon Stripping Algorithm for Retrospective Assessment of Air Filter Samples

    International Nuclear Information System (INIS)

    Hayes, Robert

    2009-01-01

    An evaluation of a large number of air sample filters was undertaken using a commercial alpha and beta spectroscopy system employing a passive implanted planar silicon (PIPS) detector. Samples were only measured after air flow through the filters had ceased. Use of a commercial radon stripping algorithm was implemented to discriminate anthropogenic alpha and beta activity on the filters from the radon progeny. When uncontaminated air filters were evaluated, the results showed that there was a time-dependent bias in both average estimates and measurement dispersion with the relative bias being small compared to the dispersion. By also measuring environmental air sample filters simultaneously with electroplated alpha and beta sources, use of the radon stripping algorithm demonstrated a number of substantial unexpected deviations. Use of the current algorithm is therefore not recommended for assay applications and so use of the PIPS detector should only be utilized for gross counting without appropriate modifications to the curve fitting algorithm. As a screening method, the radon stripping algorithm might be expected to see elevated alpha and beta activities on air sample filters (not due to radon progeny) around the 200 dpm level

  5. Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    Science.gov (United States)

    Fischer, Bernd; Hajian, Arsen; Knuth, Kevin; Schumann, Johann

    2004-04-01

    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable building blocks for the algorithm construction; they can encapsulate advanced algorithms and data structures. A symbolic-algebraic system is used to find closed-form solutions for problems and emerging subproblems. In this paper, we describe the application of AUTOBAYES to the analysis of planetary nebulae images taken by the Hubble Space Telescope. We explain the system architecture, and present in detail the automatic derivation of the scientists' original analysis as well as a refined analysis using clustering models. This study demonstrates that AUTOBAYES is now mature enough so that it can be applied to realistic scientific data analysis tasks.

  6. Pierre Gy's sampling theory and sampling practice heterogeneity, sampling correctness, and statistical process control

    CERN Document Server

    Pitard, Francis F

    1993-01-01

    Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...

  7. Plane-Based Sampling for Ray Casting Algorithm in Sequential Medical Images

    Science.gov (United States)

    Lin, Lili; Chen, Shengyong; Shao, Yan; Gu, Zichun

    2013-01-01

    This paper proposes a plane-based sampling method to improve the traditional Ray Casting Algorithm (RCA) for the fast reconstruction of a three-dimensional biomedical model from sequential images. In the novel method, the optical properties of all sampling points depend on the intersection points when a ray travels through an equidistant parallel plan cluster of the volume dataset. The results show that the method improves the rendering speed at over three times compared with the conventional algorithm and the image quality is well guaranteed. PMID:23424608

  8. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade; Manavi, Kasra; Burgos, Juan; Denny, Jory; Thomas, Shawna; Amato, Nancy M.

    2012-01-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  9. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade

    2012-05-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  10. The application of statistical and/or non-statistical sampling techniques by internal audit functions in the South African banking industry

    Directory of Open Access Journals (Sweden)

    D.P. van der Nest

    2015-03-01

    Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items

  11. A statistical mechanical interpretation of algorithmic information theory: Total statistical mechanical interpretation based on physical argument

    International Nuclear Information System (INIS)

    Tadaki, Kohtaro

    2010-01-01

    The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T is an element of (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.

  12. A clustering algorithm for sample data based on environmental pollution characteristics

    Science.gov (United States)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  13. Effective traffic features selection algorithm for cyber-attacks samples

    Science.gov (United States)

    Li, Yihong; Liu, Fangzheng; Du, Zhenyu

    2018-05-01

    By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.

  14. Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    OpenAIRE

    Fischer, Bernd; Knuth, Kevin; Hajian, Arsen; Schumann, Johann

    2004-01-01

    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable buildin...

  15. Generalized Likelihood Uncertainty Estimation (GLUE) Using Multi-Optimization Algorithm as Sampling Method

    Science.gov (United States)

    Wang, Z.

    2015-12-01

    For decades, distributed and lumped hydrological models have furthered our understanding of hydrological system. The development of hydrological simulation in large scale and high precision elaborated the spatial descriptions and hydrological behaviors. Meanwhile, the new trend is also followed by the increment of model complexity and number of parameters, which brings new challenges of uncertainty quantification. Generalized Likelihood Uncertainty Estimation (GLUE) has been widely used in uncertainty analysis for hydrological models referring to Monte Carlo method coupled with Bayesian estimation. However, the stochastic sampling method of prior parameters adopted by GLUE appears inefficient, especially in high dimensional parameter space. The heuristic optimization algorithms utilizing iterative evolution show better convergence speed and optimality-searching performance. In light of the features of heuristic optimization algorithms, this study adopted genetic algorithm, differential evolution, shuffled complex evolving algorithm to search the parameter space and obtain the parameter sets of large likelihoods. Based on the multi-algorithm sampling, hydrological model uncertainty analysis is conducted by the typical GLUE framework. To demonstrate the superiority of the new method, two hydrological models of different complexity are examined. The results shows the adaptive method tends to be efficient in sampling and effective in uncertainty analysis, providing an alternative path for uncertainty quantilization.

  16. Development of modelling algorithm of technological systems by statistical tests

    Science.gov (United States)

    Shemshura, E. A.; Otrokov, A. V.; Chernyh, V. G.

    2018-03-01

    The paper tackles the problem of economic assessment of design efficiency regarding various technological systems at the stage of their operation. The modelling algorithm of a technological system was performed using statistical tests and with account of the reliability index allows estimating the level of machinery technical excellence and defining the efficiency of design reliability against its performance. Economic feasibility of its application shall be determined on the basis of service quality of a technological system with further forecasting of volumes and the range of spare parts supply.

  17. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, P; Kleibergen, F

    2003-01-01

    We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a

  18. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

    OpenAIRE

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of...

  19. Statistical sampling method for releasing decontaminated vehicles

    International Nuclear Information System (INIS)

    Lively, J.W.; Ware, J.A.

    1996-01-01

    Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site

  20. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    Science.gov (United States)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to

  1. A hybrid reliability algorithm using PSO-optimized Kriging model and adaptive importance sampling

    Science.gov (United States)

    Tong, Cao; Gong, Haili

    2018-03-01

    This paper aims to reduce the computational cost of reliability analysis. A new hybrid algorithm is proposed based on PSO-optimized Kriging model and adaptive importance sampling method. Firstly, the particle swarm optimization algorithm (PSO) is used to optimize the parameters of Kriging model. A typical function is fitted to validate improvement by comparing results of PSO-optimized Kriging model with those of the original Kriging model. Secondly, a hybrid algorithm for reliability analysis combined optimized Kriging model and adaptive importance sampling is proposed. Two cases from literatures are given to validate the efficiency and correctness. The proposed method is proved to be more efficient due to its application of small number of sample points according to comparison results.

  2. Sampling-Based Motion Planning Algorithms for Replanning and Spatial Load Balancing

    Energy Technology Data Exchange (ETDEWEB)

    Boardman, Beth Leigh [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-10-12

    The common theme of this dissertation is sampling-based motion planning with the two key contributions being in the area of replanning and spatial load balancing for robotic systems. Here, we begin by recalling two sampling-based motion planners: the asymptotically optimal rapidly-exploring random tree (RRT*), and the asymptotically optimal probabilistic roadmap (PRM*). We also provide a brief background on collision cones and the Distributed Reactive Collision Avoidance (DRCA) algorithm. The next four chapters detail novel contributions for motion replanning in environments with unexpected static obstacles, for multi-agent collision avoidance, and spatial load balancing. First, we show improved performance of the RRT* when using the proposed Grandparent-Connection (GP) or Focused-Refinement (FR) algorithms. Next, the Goal Tree algorithm for replanning with unexpected static obstacles is detailed and proven to be asymptotically optimal. A multi-agent collision avoidance problem in obstacle environments is approached via the RRT*, leading to the novel Sampling-Based Collision Avoidance (SBCA) algorithm. The SBCA algorithm is proven to guarantee collision free trajectories for all of the agents, even when subject to uncertainties in the knowledge of the other agents’ positions and velocities. Given that a solution exists, we prove that livelocks and deadlock will lead to the cost to the goal being decreased. We introduce a new deconfliction maneuver that decreases the cost-to-come at each step. This new maneuver removes the possibility of livelocks and allows a result to be formed that proves convergence to the goal configurations. Finally, we present a limited range Graph-based Spatial Load Balancing (GSLB) algorithm which fairly divides a non-convex space among multiple agents that are subject to differential constraints and have a limited travel distance. The GSLB is proven to converge to a solution when maximizing the area covered by the agents. The analysis

  3. Performance Comparison of Reconstruction Algorithms in Discrete Blind Multi-Coset Sampling

    DEFF Research Database (Denmark)

    Grigoryan, Ruben; Arildsen, Thomas; Tandur, Deepaknath

    2012-01-01

    This paper investigates the performance of different reconstruction algorithms in discrete blind multi-coset sampling. Multi-coset scheme is a promising compressed sensing architecture that can replace traditional Nyquist-rate sampling in the applications with multi-band frequency sparse signals...

  4. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    Science.gov (United States)

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  5. Evaluation of dynamically dimensioned search algorithm for optimizing SWAT by altering sampling distributions and searching range

    Science.gov (United States)

    The primary advantage of Dynamically Dimensioned Search algorithm (DDS) is that it outperforms many other optimization techniques in both convergence speed and the ability in searching for parameter sets that satisfy statistical guidelines while requiring only one algorithm parameter (perturbation f...

  6. Improvement of characteristic statistic algorithm and its application on equilibrium cycle reloading optimization

    International Nuclear Information System (INIS)

    Hu, Y.; Liu, Z.; Shi, X.; Wang, B.

    2006-01-01

    A brief introduction of characteristic statistic algorithm (CSA) is given in the paper, which is a new global optimization algorithm to solve the problem of PWR in-core fuel management optimization. CSA is modified by the adoption of back propagation neural network and fast local adjustment. Then the modified CSA is applied to PWR Equilibrium Cycle Reloading Optimization, and the corresponding optimization code of CSA-DYW is developed. CSA-DYW is used to optimize the equilibrium cycle of 18 month reloading of Daya bay nuclear plant Unit 1 reactor. The results show that CSA-DYW has high efficiency and good global performance on PWR Equilibrium Cycle Reloading Optimization. (authors)

  7. Statistical sampling for holdup measurement

    International Nuclear Information System (INIS)

    Picard, R.R.; Pillay, K.K.S.

    1986-01-01

    Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use

  8. Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes

    Science.gov (United States)

    Williams Colin P.

    1999-01-01

    Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.

  9. The ‘39 steps’: an algorithm for performing statistical analysis of data on energy intake and expenditure

    Directory of Open Access Journals (Sweden)

    John R. Speakman

    2013-03-01

    Full Text Available The epidemics of obesity and diabetes have aroused great interest in the analysis of energy balance, with the use of organisms ranging from nematode worms to humans. Although generating energy-intake or -expenditure data is relatively straightforward, the most appropriate way to analyse the data has been an issue of contention for many decades. In the last few years, a consensus has been reached regarding the best methods for analysing such data. To facilitate using these best-practice methods, we present here an algorithm that provides a step-by-step guide for analysing energy-intake or -expenditure data. The algorithm can be used to analyse data from either humans or experimental animals, such as small mammals or invertebrates. It can be used in combination with any commercial statistics package; however, to assist with analysis, we have included detailed instructions for performing each step for three popular statistics packages (SPSS, MINITAB and R. We also provide interpretations of the results obtained at each step. We hope that this algorithm will assist in the statistically appropriate analysis of such data, a field in which there has been much confusion and some controversy.

  10. Dynamic statistical optimization of GNSS radio occultation bending angles: advanced algorithm and performance analysis

    Science.gov (United States)

    Li, Y.; Kirchengast, G.; Scherllin-Pirscher, B.; Norman, R.; Yuan, Y. B.; Fritzer, J.; Schwaerz, M.; Zhang, K.

    2015-08-01

    We introduce a new dynamic statistical optimization algorithm to initialize ionosphere-corrected bending angles of Global Navigation Satellite System (GNSS)-based radio occultation (RO) measurements. The new algorithm estimates background and observation error covariance matrices with geographically varying uncertainty profiles and realistic global-mean correlation matrices. The error covariance matrices estimated by the new approach are more accurate and realistic than in simplified existing approaches and can therefore be used in statistical optimization to provide optimal bending angle profiles for high-altitude initialization of the subsequent Abel transform retrieval of refractivity. The new algorithm is evaluated against the existing Wegener Center Occultation Processing System version 5.6 (OPSv5.6) algorithm, using simulated data on two test days from January and July 2008 and real observed CHAllenging Minisatellite Payload (CHAMP) and Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) measurements from the complete months of January and July 2008. The following is achieved for the new method's performance compared to OPSv5.6: (1) significant reduction of random errors (standard deviations) of optimized bending angles down to about half of their size or more; (2) reduction of the systematic differences in optimized bending angles for simulated MetOp data; (3) improved retrieval of refractivity and temperature profiles; and (4) realistically estimated global-mean correlation matrices and realistic uncertainty fields for the background and observations. Overall the results indicate high suitability for employing the new dynamic approach in the processing of long-term RO data into a reference climate record, leading to well-characterized and high-quality atmospheric profiles over the entire stratosphere.

  11. An algorithm to improve sampling efficiency for uncertainty propagation using sampling based method

    International Nuclear Information System (INIS)

    Campolina, Daniel; Lima, Paulo Rubens I.; Pereira, Claubia; Veloso, Maria Auxiliadora F.

    2015-01-01

    Sample size and computational uncertainty were varied in order to investigate sample efficiency and convergence of the sampling based method for uncertainty propagation. Transport code MCNPX was used to simulate a LWR model and allow the mapping, from uncertain inputs of the benchmark experiment, to uncertain outputs. Random sampling efficiency was improved through the use of an algorithm for selecting distributions. Mean range, standard deviation range and skewness were verified in order to obtain a better representation of uncertainty figures. Standard deviation of 5 pcm in the propagated uncertainties for 10 n-samples replicates was adopted as convergence criterion to the method. Estimation of 75 pcm uncertainty on reactor k eff was accomplished by using sample of size 93 and computational uncertainty of 28 pcm to propagate 1σ uncertainty of burnable poison radius. For a fixed computational time, in order to reduce the variance of the uncertainty propagated, it was found, for the example under investigation, it is preferable double the sample size than double the amount of particles followed by Monte Carlo process in MCNPX code. (author)

  12. Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm

    Science.gov (United States)

    Artrith, Nongnuch; Urban, Alexander; Ceder, Gerbrand

    2018-06-01

    The atomistic modeling of amorphous materials requires structure sizes and sampling statistics that are challenging to achieve with first-principles methods. Here, we propose a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs). We show for the example of the amorphous LiSi alloy that around 1000 first-principles calculations are sufficient for the ANN-potential assisted sampling of low-energy atomic configurations in the entire amorphous LixSi phase space. The obtained phase diagram is validated by comparison with the results from an extensive sampling of LixSi configurations using molecular dynamics simulations and a general ANN potential trained to ˜45 000 first-principles calculations. This demonstrates the utility of the approach for the first-principles modeling of amorphous materials.

  13. Sampling methods to the statistical control of the production of blood components.

    Science.gov (United States)

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Some algorithms for reordering a sequence of objects, with application to E. Sparre Andersen's principle of equivalence in mathematical statistics

    NARCIS (Netherlands)

    Bruijn, de N.G.

    1972-01-01

    Recently A. W. Joseph described an algorithm providing combinatorial insight into E. Sparre Andersen's so-called Principle of Equivalence in mathematical statistics. In the present paper such algorithms are discussed systematically.

  15. Inverse problems with Poisson data: statistical regularization theory, applications and algorithms

    International Nuclear Information System (INIS)

    Hohage, Thorsten; Werner, Frank

    2016-01-01

    Inverse problems with Poisson data arise in many photonic imaging modalities in medicine, engineering and astronomy. The design of regularization methods and estimators for such problems has been studied intensively over the last two decades. In this review we give an overview of statistical regularization theory for such problems, the most important applications, and the most widely used algorithms. The focus is on variational regularization methods in the form of penalized maximum likelihood estimators, which can be analyzed in a general setup. Complementing a number of recent convergence rate results we will establish consistency results. Moreover, we discuss estimators based on a wavelet-vaguelette decomposition of the (necessarily linear) forward operator. As most prominent applications we briefly introduce Positron emission tomography, inverse problems in fluorescence microscopy, and phase retrieval problems. The computation of a penalized maximum likelihood estimator involves the solution of a (typically convex) minimization problem. We also review several efficient algorithms which have been proposed for such problems over the last five years. (topical review)

  16. Algorithm for statistical noise reduction in three-dimensional ion implant simulations

    International Nuclear Information System (INIS)

    Hernandez-Mangas, J.M.; Arias, J.; Jaraiz, M.; Bailon, L.; Barbolla, J.

    2001-01-01

    As integrated circuit devices scale into the deep sub-micron regime, ion implantation will continue to be the primary means of introducing dopant atoms into silicon. Different types of impurity profiles such as ultra-shallow profiles and retrograde profiles are necessary for deep submicron devices in order to realize the desired device performance. A new algorithm to reduce the statistical noise in three-dimensional ion implant simulations both in the lateral and shallow/deep regions of the profile is presented. The computational effort in BCA Monte Carlo ion implant simulation is also reduced

  17. Statistical methods applied to gamma-ray spectroscopy algorithms in nuclear security missions.

    Science.gov (United States)

    Fagan, Deborah K; Robinson, Sean M; Runkle, Robert C

    2012-10-01

    Gamma-ray spectroscopy is a critical research and development priority to a range of nuclear security missions, specifically the interdiction of special nuclear material involving the detection and identification of gamma-ray sources. We categorize existing methods by the statistical methods on which they rely and identify methods that have yet to be considered. Current methods estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty, which may be significantly more complex. Thus, significantly improving algorithm performance may require greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods, could reduce decision uncertainty by rigorously and comprehensively incorporating all sources of uncertainty. Application of such methods should further meet the needs of nuclear security missions by improving upon the existing numerical infrastructure for which these analyses have not been conducted. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Hybrid algorithm of ensemble transform and importance sampling for assimilation of non-Gaussian observations

    Directory of Open Access Journals (Sweden)

    Shin'ya Nakano

    2014-05-01

    Full Text Available A hybrid algorithm that combines the ensemble transform Kalman filter (ETKF and the importance sampling approach is proposed. Since the ETKF assumes a linear Gaussian observation model, the estimate obtained by the ETKF can be biased in cases with nonlinear or non-Gaussian observations. The particle filter (PF is based on the importance sampling technique, and is applicable to problems with nonlinear or non-Gaussian observations. However, the PF usually requires an unrealistically large sample size in order to achieve a good estimation, and thus it is computationally prohibitive. In the proposed hybrid algorithm, we obtain a proposal distribution similar to the posterior distribution by using the ETKF. A large number of samples are then drawn from the proposal distribution, and these samples are weighted to approximate the posterior distribution according to the importance sampling principle. Since the importance sampling provides an estimate of the probability density function (PDF without assuming linearity or Gaussianity, we can resolve the bias due to the nonlinear or non-Gaussian observations. Finally, in the next forecast step, we reduce the sample size to achieve computational efficiency based on the Gaussian assumption, while we use a relatively large number of samples in the importance sampling in order to consider the non-Gaussian features of the posterior PDF. The use of the ETKF is also beneficial in terms of the computational simplicity of generating a number of random samples from the proposal distribution and in weighting each of the samples. The proposed algorithm is not necessarily effective in case that the ensemble is located distant from the true state. However, monitoring the effective sample size and tuning the factor for covariance inflation could resolve this problem. In this paper, the proposed hybrid algorithm is introduced and its performance is evaluated through experiments with non-Gaussian observations.

  19. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.

    2017-08-21

    Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,” has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer

  20. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.S.; Goldman, A.S.

    1987-01-01

    This paper presents improved procedures for obtaining statistically valid sampling plans for nuclear facilities. The double sampling concept and methods for developing optimal double sampling plans are described. An algorithm is described that is satisfactory for finding optimal double sampling plans and choosing appropriate detection and false alarm probabilities

  1. A Statistical Algorithm for Estimating Chlorophyll Concentration in the New Caledonian Lagoon

    Directory of Open Access Journals (Sweden)

    Guillaume Wattelez

    2016-01-01

    Full Text Available Spatial and temporal dynamics of phytoplankton biomass and water turbidity can provide crucial information about the function, health and vulnerability of lagoon ecosystems (coral reefs, sea grasses, etc.. A statistical algorithm is proposed to estimate chlorophyll-a concentration ([chl-a] in optically complex waters of the New Caledonian lagoon from MODIS-derived “remote-sensing” reflectance (Rrs. The algorithm is developed via supervised learning on match-ups gathered from 2002 to 2010. The best performance is obtained by combining two models, selected according to the ratio of Rrs in spectral bands centered on 488 and 555 nm: a log-linear model for low [chl-a] (AFLC and a support vector machine (SVM model or a classic model (OC3 for high [chl-a]. The log-linear model is developed based on SVM regression analysis. This approach outperforms the classical OC3 approach, especially in shallow waters, with a root mean squared error 30% lower. The proposed algorithm enables more accurate assessments of [chl-a] and its variability in this typical oligo- to meso-trophic tropical lagoon, from shallow coastal waters and nearby reefs to deeper waters and in the open ocean.

  2. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  3. An Overview of a Class of Clock Synchronization Algorithms for Wireless Sensor Networks: A Statistical Signal Processing Perspective

    Directory of Open Access Journals (Sweden)

    Xu Wang

    2015-08-01

    Full Text Available Recently, wireless sensor networks (WSNs have drawn great interest due to their outstanding monitoring and management potential in medical, environmental and industrial applications. Most of the applications that employ WSNs demand all of the sensor nodes to run on a common time scale, a requirement that highlights the importance of clock synchronization. The clock synchronization problem in WSNs is inherently related to parameter estimation. The accuracy of clock synchronization algorithms depends essentially on the statistical properties of the parameter estimation algorithms. Recently, studies dedicated to the estimation of synchronization parameters, such as clock offset and skew, have begun to emerge in the literature. The aim of this article is to provide an overview of the state-of-the-art clock synchronization algorithms for WSNs from a statistical signal processing point of view. This article focuses on describing the key features of the class of clock synchronization algorithms that exploit the traditional two-way message (signal exchange mechanism. Upon introducing the two-way message exchange mechanism, the main clock offset estimation algorithms for pairwise synchronization of sensor nodes are first reviewed, and their performance is compared. The class of fully-distributed clock offset estimation algorithms for network-wide synchronization is then surveyed. The paper concludes with a list of open research problems pertaining to clock synchronization of WSNs.

  4. Evaluation of observables in statistical multifragmentation theories

    International Nuclear Information System (INIS)

    Cole, A.J.

    1989-01-01

    The canonical formulation of equilibrium statistical multifragmentation is examined. It is shown that the explicit construction of observables (average values) by sampling the partition probabilities is unnecessary insofar as closed expressions in the form of recursion relations can be obtained quite easily. Such expressions may conversely be used to verify the sampling algorithms

  5. The Role of the Sampling Distribution in Understanding Statistical Inference

    Science.gov (United States)

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  6. Weighted statistical parameters for irregularly sampled time series

    Science.gov (United States)

    Rimoldini, Lorenzo

    2014-01-01

    Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.

  7. Sampling algorithms for validation of supervised learning models for Ising-like systems

    Science.gov (United States)

    Portman, Nataliya; Tamblyn, Isaac

    2017-12-01

    In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).

  8. A novel directional asymmetric sampling search algorithm for fast block-matching motion estimation

    Science.gov (United States)

    Li, Yue-e.; Wang, Qiang

    2011-11-01

    This paper proposes a novel directional asymmetric sampling search (DASS) algorithm for video compression. Making full use of the error information (block distortions) of the search patterns, eight different direction search patterns are designed for various situations. The strategy of local sampling search is employed for the search of big-motion vector. In order to further speed up the search, early termination strategy is adopted in procedure of DASS. Compared to conventional fast algorithms, the proposed method has the most satisfactory PSNR values for all test sequences.

  9. An Energy Efficient Adaptive Sampling Algorithm in a Sensor Network for Automated Water Quality Monitoring.

    Science.gov (United States)

    Shu, Tongxin; Xia, Min; Chen, Jiahong; Silva, Clarence de

    2017-11-05

    Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA) is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO) and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA), while achieving around the same Normalized Mean Error (NME), DDASA is superior in saving 5.31% more battery energy.

  10. An Energy Efficient Adaptive Sampling Algorithm in a Sensor Network for Automated Water Quality Monitoring

    Directory of Open Access Journals (Sweden)

    Tongxin Shu

    2017-11-01

    Full Text Available Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA, while achieving around the same Normalized Mean Error (NME, DDASA is superior in saving 5.31% more battery energy.

  11. Improved Noise Minimum Statistics Estimation Algorithm for Using in a Speech-Passing Noise-Rejecting Headset

    Directory of Open Access Journals (Sweden)

    Seyedtabaee Saeed

    2010-01-01

    Full Text Available This paper deals with configuration of an algorithm to be used in a speech-passing angle grinder noise-canceling headset. Angle grinder noise is annoying and interrupts ordinary oral communication. Meaning that, low SNR noisy condition is ahead. Since variation in angle grinder working condition changes noise statistics, the noise will be nonstationary with possible jumps in its power. Studies are conducted for picking an appropriate algorithm. A modified version of the well-known spectral subtraction shows superior performance against alternate methods. Noise estimation is calculated through a multi-band fast adapting scheme. The algorithm is adapted very quickly to the non-stationary noise environment while inflecting minimum musical noise and speech distortion on the processed signal. Objective and subjective measures illustrating the performance of the proposed method are introduced.

  12. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    Science.gov (United States)

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  13. Image-Based Phenotypic Screening with Human Primary T Cells Using One-Dimensional Imaging Cytometry with Self-Tuning Statistical-Gating Algorithms.

    Science.gov (United States)

    Wang, Steve S; Ehrlich, Daniel J

    2017-09-01

    The parallel microfluidic cytometer (PMC) is an imaging flow cytometer that operates on statistical analysis of low-pixel-count, one-dimensional (1D) line scans. It is highly efficient in data collection and operates on suspension cells. In this article, we present a supervised automated pipeline for the PMC that minimizes operator intervention by incorporating multivariate logistic regression for data scoring. We test the self-tuning statistical algorithms in a human primary T-cell activation assay in flow using nuclear factor of activated T cells (NFAT) translocation as a readout and readily achieve an average Z' of 0.55 and strictly standardized mean difference of 13 with standard phorbol myristate acetate/ionomycin induction. To implement the tests, we routinely load 4 µL samples and can readout 3000 to 9000 independent conditions from 15 mL of primary human blood (buffy coat fraction). We conclude that the new technology will support primary-cell protein-localization assays and "on-the-fly" data scoring at a sample throughput of more than 100,000 wells per day and that it is, in principle, consistent with a primary pharmaceutical screen.

  14. Sample Data Synchronization and Harmonic Analysis Algorithm Based on Radial Basis Function Interpolation

    Directory of Open Access Journals (Sweden)

    Huaiqing Zhang

    2014-01-01

    Full Text Available The spectral leakage has a harmful effect on the accuracy of harmonic analysis for asynchronous sampling. This paper proposed a time quasi-synchronous sampling algorithm which is based on radial basis function (RBF interpolation. Firstly, a fundamental period is evaluated by a zero-crossing technique with fourth-order Newton’s interpolation, and then, the sampling sequence is reproduced by the RBF interpolation. Finally, the harmonic parameters can be calculated by FFT on the synchronization of sampling data. Simulation results showed that the proposed algorithm has high accuracy in measuring distorted and noisy signals. Compared to the local approximation schemes as linear, quadric, and fourth-order Newton interpolations, the RBF is a global approximation method which can acquire more accurate results while the time-consuming is about the same as Newton’s.

  15. Potential-Decomposition Strategy in Markov Chain Monte Carlo Sampling Algorithms

    International Nuclear Information System (INIS)

    Shangguan Danhua; Bao Jingdong

    2010-01-01

    We introduce the potential-decomposition strategy (PDS), which can he used in Markov chain Monte Carlo sampling algorithms. PDS can be designed to make particles move in a modified potential that favors diffusion in phase space, then, by rejecting some trial samples, the target distributions can be sampled in an unbiased manner. Furthermore, if the accepted trial samples are insufficient, they can be recycled as initial states to form more unbiased samples. This strategy can greatly improve efficiency when the original potential has multiple metastable states separated by large barriers. We apply PDS to the 2d Ising model and a double-well potential model with a large barrier, demonstrating in these two representative examples that convergence is accelerated by orders of magnitude.

  16. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification

    Directory of Open Access Journals (Sweden)

    D. Ramyachitra

    2015-09-01

    Full Text Available Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM, K-nearest neighbor (KNN, Interval Valued Classification (IVC and the improvised Interval Value based Particle Swarm Optimization (IVPSO algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  17. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification.

    Science.gov (United States)

    Ramyachitra, D; Sofia, M; Manikandan, P

    2015-09-01

    Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  18. Comparison of pure and 'Latinized' centroidal Voronoi tessellation against various other statistical sampling methods

    International Nuclear Information System (INIS)

    Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.

    2006-01-01

    A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities

  19. Statistically Optimized Inversion Algorithm for Enhanced Retrieval of Aerosol Properties from Spectral Multi-Angle Polarimetric Satellite Observations

    Science.gov (United States)

    Dubovik, O; Herman, M.; Holdak, A.; Lapyonok, T.; Taure, D.; Deuze, J. L.; Ducos, F.; Sinyuk, A.

    2011-01-01

    The proposed development is an attempt to enhance aerosol retrieval by emphasizing statistical optimization in inversion of advanced satellite observations. This optimization concept improves retrieval accuracy relying on the knowledge of measurement error distribution. Efficient application of such optimization requires pronounced data redundancy (excess of the measurements number over number of unknowns) that is not common in satellite observations. The POLDER imager on board the PARASOL microsatellite registers spectral polarimetric characteristics of the reflected atmospheric radiation at up to 16 viewing directions over each observed pixel. The completeness of such observations is notably higher than for most currently operating passive satellite aerosol sensors. This provides an opportunity for profound utilization of statistical optimization principles in satellite data inversion. The proposed retrieval scheme is designed as statistically optimized multi-variable fitting of all available angular observations obtained by the POLDER sensor in the window spectral channels where absorption by gas is minimal. The total number of such observations by PARASOL always exceeds a hundred over each pixel and the statistical optimization concept promises to be efficient even if the algorithm retrieves several tens of aerosol parameters. Based on this idea, the proposed algorithm uses a large number of unknowns and is aimed at retrieval of extended set of parameters affecting measured radiation.

  20. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    Science.gov (United States)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  1. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

    OpenAIRE

    Ali, Hesham Arafat; Attiya, Salah; El-henawy, Ibrahim

    2017-01-01

    In this paper we develop parallel cluster sampling algorithms and show that a multi-chain version is embarrassingly parallel and can be used efficiently for medical image retrieval among other applications.

  3. Learning maximum entropy models from finite-size data sets: A fast data-driven algorithm allows sampling from the posterior distribution.

    Science.gov (United States)

    Ferrari, Ulisse

    2016-08-01

    Maximum entropy models provide the least constrained probability distributions that reproduce statistical properties of experimental datasets. In this work we characterize the learning dynamics that maximizes the log-likelihood in the case of large but finite datasets. We first show how the steepest descent dynamics is not optimal as it is slowed down by the inhomogeneous curvature of the model parameters' space. We then provide a way for rectifying this space which relies only on dataset properties and does not require large computational efforts. We conclude by solving the long-time limit of the parameters' dynamics including the randomness generated by the systematic use of Gibbs sampling. In this stochastic framework, rather than converging to a fixed point, the dynamics reaches a stationary distribution, which for the rectified dynamics reproduces the posterior distribution of the parameters. We sum up all these insights in a "rectified" data-driven algorithm that is fast and by sampling from the parameters' posterior avoids both under- and overfitting along all the directions of the parameters' space. Through the learning of pairwise Ising models from the recording of a large population of retina neurons, we show how our algorithm outperforms the steepest descent method.

  4. Effect of model choice and sample size on statistical tolerance limits

    International Nuclear Information System (INIS)

    Duran, B.S.; Campbell, K.

    1980-03-01

    Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations

  5. Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

    Directory of Open Access Journals (Sweden)

    Elias Chaibub Neto

    Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.

  6. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    Science.gov (United States)

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  7. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    Science.gov (United States)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems

  8. MUSIC ALGORITHM FOR LOCATING POINT-LIKE SCATTERERS CONTAINED IN A SAMPLE ON FLAT SUBSTRATE

    Institute of Scientific and Technical Information of China (English)

    Dong Heping; Ma Fuming; Zhang Deyue

    2012-01-01

    In this paper,we consider a MUSIC algorithm for locating point-like scatterers contained in a sample on flat substrate.Based on an asymptotic expansion of the scattering amplitude proposed by Ammari et al.,the reconstruction problem can be reduced to a calculation of Green function corresponding to the background medium.In addition,we use an explicit formulation of Green function in the MUSIC algorithm to simplify the calculation when the cross-section of sample is a half-disc.Numerical experiments are included to demonstrate the feasibility of this method.

  9. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    Science.gov (United States)

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  10. Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

    Science.gov (United States)

    Earley, Mark A.

    This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…

  11. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    Science.gov (United States)

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  12. A Matlab user interface for the statistically assisted fluid registration algorithm and tensor-based morphometry

    Science.gov (United States)

    Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha

    2015-01-01

    Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.

  13. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    Science.gov (United States)

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  14. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.; Goldman, A.

    1987-01-01

    Since their introduction in 1942, sampling inspection procedures have been common quality assurance practice. The U.S. Department of Energy (DOE) supports such sampling of special nuclear materials inventories. The DOE Order 5630.7 states, Operations Offices may develop and use statistically valid sampling plans appropriate for their site-specific needs. The benefits for nuclear facilities operations include reduced worker exposure and reduced work load. Improved procedures have been developed for obtaining statistically valid sampling plans that maximize these benefits. The double sampling concept is described and the resulting sample sizes for double sample plans are compared with other plans. An algorithm is given for finding optimal double sampling plans that assist in choosing the appropriate detection and false alarm probabilities for various sampling plans

  15. Statistical Physics, Optimization, Inference, and Message-Passing Algorithms : Lecture Notes of the Les Houches School of Physics : Special Issue, October 2013

    CERN Document Server

    Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F

    2015-01-01

    This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...

  16. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    Science.gov (United States)

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. A novel statistical algorithm for gene expression analysis helps differentiate pregnane X receptor-dependent and independent mechanisms of toxicity.

    Directory of Open Access Journals (Sweden)

    M Ann Mongan

    Full Text Available Genome-wide gene expression profiling has become standard for assessing potential liabilities as well as for elucidating mechanisms of toxicity of drug candidates under development. Analysis of microarray data is often challenging due to the lack of a statistical model that is amenable to biological variation in a small number of samples. Here we present a novel non-parametric algorithm that requires minimal assumptions about the data distribution. Our method for determining differential expression consists of two steps: 1 We apply a nominal threshold on fold change and platform p-value to designate whether a gene is differentially expressed in each treated and control sample relative to the averaged control pool, and 2 We compared the number of samples satisfying criteria in step 1 between the treated and control groups to estimate the statistical significance based on a null distribution established by sample permutations. The method captures group effect without being too sensitive to anomalies as it allows tolerance for potential non-responders in the treatment group and outliers in the control group. Performance and results of this method were compared with the Significant Analysis of Microarrays (SAM method. These two methods were applied to investigate hepatic transcriptional responses of wild-type (PXR(+/+ and pregnane X receptor-knockout (PXR(-/- mice after 96 h exposure to CMP013, an inhibitor of β-secretase (β-site of amyloid precursor protein cleaving enzyme 1 or BACE1. Our results showed that CMP013 led to transcriptional changes in hallmark PXR-regulated genes and induced a cascade of gene expression changes that explained the hepatomegaly observed only in PXR(+/+ animals. Comparison of concordant expression changes between PXR(+/+ and PXR(-/- mice also suggested a PXR-independent association between CMP013 and perturbations to cellular stress, lipid metabolism, and biliary transport.

  18. Comparison of Statistical Algorithms for the Detection of Infectious Disease Outbreaks in Large Multiple Surveillance Systems

    Science.gov (United States)

    Farrington, C. Paddy; Noufaily, Angela; Andrews, Nick J.; Charlett, Andre

    2016-01-01

    A large-scale multiple surveillance system for infectious disease outbreaks has been in operation in England and Wales since the early 1990s. Changes to the statistical algorithm at the heart of the system were proposed and the purpose of this paper is to compare two new algorithms with the original algorithm. Test data to evaluate performance are created from weekly counts of the number of cases of each of more than 2000 diseases over a twenty-year period. The time series of each disease is separated into one series giving the baseline (background) disease incidence and a second series giving disease outbreaks. One series is shifted forward by twelve months and the two are then recombined, giving a realistic series in which it is known where outbreaks have been added. The metrics used to evaluate performance include a scoring rule that appropriately balances sensitivity against specificity and is sensitive to variation in probabilities near 1. In the context of disease surveillance, a scoring rule can be adapted to reflect the size of outbreaks and this was done. Results indicate that the two new algorithms are comparable to each other and better than the algorithm they were designed to replace. PMID:27513749

  19. Generation of a statistical shape model with probabilistic point correspondences and the expectation maximization- iterative closest point algorithm

    International Nuclear Information System (INIS)

    Hufnagel, Heike; Pennec, Xavier; Ayache, Nicholas; Ehrhardt, Jan; Handels, Heinz

    2008-01-01

    Identification of point correspondences between shapes is required for statistical analysis of organ shapes differences. Since manual identification of landmarks is not a feasible option in 3D, several methods were developed to automatically find one-to-one correspondences on shape surfaces. For unstructured point sets, however, one-to-one correspondences do not exist but correspondence probabilities can be determined. A method was developed to compute a statistical shape model based on shapes which are represented by unstructured point sets with arbitrary point numbers. A fundamental problem when computing statistical shape models is the determination of correspondences between the points of the shape observations of the training data set. In the absence of landmarks, exact correspondences can only be determined between continuous surfaces, not between unstructured point sets. To overcome this problem, we introduce correspondence probabilities instead of exact correspondences. The correspondence probabilities are found by aligning the observation shapes with the affine expectation maximization-iterative closest points (EM-ICP) registration algorithm. In a second step, the correspondence probabilities are used as input to compute a mean shape (represented once again by an unstructured point set). Both steps are unified in a single optimization criterion which depe nds on the two parameters 'registration transformation' and 'mean shape'. In a last step, a variability model which best represents the variability in the training data set is computed. Experiments on synthetic data sets and in vivo brain structure data sets (MRI) are then designed to evaluate the performance of our algorithm. The new method was applied to brain MRI data sets, and the estimated point correspondences were compared to a statistical shape model built on exact correspondences. Based on established measures of ''generalization ability'' and ''specificity'', the estimates were very satisfactory

  20. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    International Nuclear Information System (INIS)

    Enqvist, Andreas

    2008-03-01

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible sample

  1. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    Energy Technology Data Exchange (ETDEWEB)

    Enqvist, Andreas

    2008-03-15

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible

  2. Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

    KAUST Repository

    Elsheikh, A. H.; Wheeler, M. F.; Hoteit, Ibrahim

    2013-01-01

    Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known

  3. Poisson-Box Sampling algorithms for three-dimensional Markov binary mixtures

    Science.gov (United States)

    Larmier, Coline; Zoia, Andrea; Malvagi, Fausto; Dumonteil, Eric; Mazzolo, Alain

    2018-02-01

    Particle transport in Markov mixtures can be addressed by the so-called Chord Length Sampling (CLS) methods, a family of Monte Carlo algorithms taking into account the effects of stochastic media on particle propagation by generating on-the-fly the material interfaces crossed by the random walkers during their trajectories. Such methods enable a significant reduction of computational resources as opposed to reference solutions obtained by solving the Boltzmann equation for a large number of realizations of random media. CLS solutions, which neglect correlations induced by the spatial disorder, are faster albeit approximate, and might thus show discrepancies with respect to reference solutions. In this work we propose a new family of algorithms (called 'Poisson Box Sampling', PBS) aimed at improving the accuracy of the CLS approach for transport in d-dimensional binary Markov mixtures. In order to probe the features of PBS methods, we will focus on three-dimensional Markov media and revisit the benchmark problem originally proposed by Adams, Larsen and Pomraning [1] and extended by Brantley [2]: for these configurations we will compare reference solutions, standard CLS solutions and the new PBS solutions for scalar particle flux, transmission and reflection coefficients. PBS will be shown to perform better than CLS at the expense of a reasonable increase in computational time.

  4. NETWORKS OF NANOPARTICLES IN ORGANIC – INORGANIC COMPOSITES: ALGORITHMIC EXTRACTION AND STATISTICAL ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ralf Thiedmann

    2012-03-01

    Full Text Available The rising global demand in energy and the limited resources in fossil fuels require new technologies in renewable energies like solar cells. Silicon solar cells offer a good efficiency but suffer from high production costs. A promising alternative are polymer solar cells, due to potentially low production costs and high flexibility of the panels. In this paper, the nanostructure of organic–inorganic composites is investigated, which can be used as photoactive layers in hybrid–polymer solar cells. These materials consist of a polymeric (OC1C10-PPV phase with CdSe nanoparticles embedded therein. On the basis of 3D image data with high spatial resolution, gained by electron tomography, an algorithm is developed to automatically extract the CdSe nanoparticles from grayscale images, where we assume them as spheres. The algorithm is based on a modified version of the Hough transform, where a watershed algorithm is used to separate the image data into basins such that each basin contains exactly one nanoparticle. After their extraction, neighboring nanoparticles are connected to form a 3D network that is related to the transport of electrons in polymer solar cells. A detailed statistical analysis of the CdSe network morphology is accomplished, which allows deeper insight into the hopping percolation pathways of electrons.

  5. Field Sampling from a Segmented Image

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2008-06-01

    Full Text Available This paper presents a statistical method for deriving the optimal prospective field sampling scheme on a remote sensing image to represent different categories in the field. The iterated conditional modes algorithm (ICM) is used for segmentation...

  6. Statistical Inference for Data Adaptive Target Parameters.

    Science.gov (United States)

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  7. Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm.

    Science.gov (United States)

    de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino

    2018-05-01

    This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.

  8. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

    Directory of Open Access Journals (Sweden)

    R. Eric Heidel

    2016-01-01

    Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  9. Statistical analyses to support guidelines for marine avian sampling. Final report

    Science.gov (United States)

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical

  10. Physics-Based Image Segmentation Using First Order Statistical Properties and Genetic Algorithm for Inductive Thermography Imaging.

    Science.gov (United States)

    Gao, Bin; Li, Xiaoqing; Woo, Wai Lok; Tian, Gui Yun

    2018-05-01

    Thermographic inspection has been widely applied to non-destructive testing and evaluation with the capabilities of rapid, contactless, and large surface area detection. Image segmentation is considered essential for identifying and sizing defects. To attain a high-level performance, specific physics-based models that describe defects generation and enable the precise extraction of target region are of crucial importance. In this paper, an effective genetic first-order statistical image segmentation algorithm is proposed for quantitative crack detection. The proposed method automatically extracts valuable spatial-temporal patterns from unsupervised feature extraction algorithm and avoids a range of issues associated with human intervention in laborious manual selection of specific thermal video frames for processing. An internal genetic functionality is built into the proposed algorithm to automatically control the segmentation threshold to render enhanced accuracy in sizing the cracks. Eddy current pulsed thermography will be implemented as a platform to demonstrate surface crack detection. Experimental tests and comparisons have been conducted to verify the efficacy of the proposed method. In addition, a global quantitative assessment index F-score has been adopted to objectively evaluate the performance of different segmentation algorithms.

  11. Statistical characterization of a large geochemical database and effect of sample size

    Science.gov (United States)

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively

  12. Statistical sampling strategies

    International Nuclear Information System (INIS)

    Andres, T.H.

    1987-01-01

    Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized

  13. Statistical learning in high energy and astrophysics

    International Nuclear Information System (INIS)

    Zimmermann, J.

    2005-01-01

    This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot be controlled in a

  14. Statistical learning in high energy and astrophysics

    Energy Technology Data Exchange (ETDEWEB)

    Zimmermann, J.

    2005-06-16

    This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot

  15. A neural algorithm for the non-uniform and adaptive sampling of biomedical data.

    Science.gov (United States)

    Mesin, Luca

    2016-04-01

    Body sensors are finding increasing applications in the self-monitoring for health-care and in the remote surveillance of sensitive people. The physiological data to be sampled can be non-stationary, with bursts of high amplitude and frequency content providing most information. Such data could be sampled efficiently with a non-uniform schedule that increases the sampling rate only during activity bursts. A real time and adaptive algorithm is proposed to select the sampling rate, in order to reduce the number of measured samples, but still recording the main information. The algorithm is based on a neural network which predicts the subsequent samples and their uncertainties, requiring a measurement only when the risk of the prediction is larger than a selectable threshold. Four examples of application to biomedical data are discussed: electromyogram, electrocardiogram, electroencephalogram, and body acceleration. Sampling rates are reduced under the Nyquist limit, still preserving an accurate representation of the data and of their power spectral densities (PSD). For example, sampling at 60% of the Nyquist frequency, the percentage average rectified errors in estimating the signals are on the order of 10% and the PSD is fairly represented, until the highest frequencies. The method outperforms both uniform sampling and compressive sensing applied to the same data. The discussed method allows to go beyond Nyquist limit, still preserving the information content of non-stationary biomedical signals. It could find applications in body sensor networks to lower the number of wireless communications (saving sensor power) and to reduce the occupation of memory. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Statistical surrogate model based sampling criterion for stochastic global optimization of problems with constraints

    Energy Technology Data Exchange (ETDEWEB)

    Cho, Su Gil; Jang, Jun Yong; Kim, Ji Hoon; Lee, Tae Hee [Hanyang University, Seoul (Korea, Republic of); Lee, Min Uk [Romax Technology Ltd., Seoul (Korea, Republic of); Choi, Jong Su; Hong, Sup [Korea Research Institute of Ships and Ocean Engineering, Daejeon (Korea, Republic of)

    2015-04-15

    Sequential surrogate model-based global optimization algorithms, such as super-EGO, have been developed to increase the efficiency of commonly used global optimization technique as well as to ensure the accuracy of optimization. However, earlier studies have drawbacks because there are three phases in the optimization loop and empirical parameters. We propose a united sampling criterion to simplify the algorithm and to achieve the global optimum of problems with constraints without any empirical parameters. It is able to select the points located in a feasible region with high model uncertainty as well as the points along the boundary of constraint at the lowest objective value. The mean squared error determines which criterion is more dominant among the infill sampling criterion and boundary sampling criterion. Also, the method guarantees the accuracy of the surrogate model because the sample points are not located within extremely small regions like super-EGO. The performance of the proposed method, such as the solvability of a problem, convergence properties, and efficiency, are validated through nonlinear numerical examples with disconnected feasible regions.

  17. Statistical sampling and modelling for cork oak and eucalyptus stands

    NARCIS (Netherlands)

    Paulo, M.J.

    2002-01-01

    This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication

  18. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms

    Science.gov (United States)

    Elhadad, N.; Claassen, J.; Perotte, R.; Goldstein, A.; Hripcsak, G.

    2018-01-01

    We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a

  19. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms.

    Science.gov (United States)

    Albers, D J; Elhadad, N; Claassen, J; Perotte, R; Goldstein, A; Hripcsak, G

    2018-02-01

    We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a

  20. DSMC multicomponent aerosol dynamics: Sampling algorithms and aerosol processes

    Science.gov (United States)

    Palaniswaamy, Geethpriya

    The post-accident nuclear reactor primary and containment environments can be characterized by high temperatures and pressures, and fission products and nuclear aerosols. These aerosols evolve via natural transport processes as well as under the influence of engineered safety features. These aerosols can be hazardous and may pose risk to the public if released into the environment. Computations of their evolution, movement and distribution involve the study of various processes such as coagulation, deposition, condensation, etc., and are influenced by factors such as particle shape, charge, radioactivity and spatial inhomogeneity. These many factors make the numerical study of nuclear aerosol evolution computationally very complicated. The focus of this research is on the use of the Direct Simulation Monte Carlo (DSMC) technique to elucidate the role of various phenomena that influence the nuclear aerosol evolution. In this research, several aerosol processes such as coagulation, deposition, condensation, and source reinforcement are explored for a multi-component, aerosol dynamics problem in a spatially homogeneous medium. Among the various sampling algorithms explored the Metropolis sampling algorithm was found to be effective and fast. Several test problems and test cases are simulated using the DSMC technique. The DSMC results obtained are verified against the analytical and sectional results for appropriate test problems. Results show that the assumption of a single mean density is not appropriate due to the complicated effect of component densities on the aerosol processes. The methods developed and the insights gained will also be helpful in future research on the challenges associated with the description of fission product and aerosol releases.

  1. Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications

    International Nuclear Information System (INIS)

    Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L.H.

    2015-01-01

    In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The “muon generator” produces muons with zenith angles in the range 0–90° and energies in the range 1–100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance–Rejection and Metropolis–Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1–60 GeV and zenith angles 0–90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic–polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed “muon generator” is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.

  2. Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications

    Science.gov (United States)

    Chatzidakis, S.; Chrysikopoulou, S.; Tsoukalas, L. H.

    2015-12-01

    In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The "muon generator" produces muons with zenith angles in the range 0-90° and energies in the range 1-100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance-Rejection and Metropolis-Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1-60 GeV and zenith angles 0-90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic-polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed "muon generator" is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.

  3. Developing a cosmic ray muon sampling capability for muon tomography and monitoring applications

    Energy Technology Data Exchange (ETDEWEB)

    Chatzidakis, S., E-mail: schatzid@purdue.edu; Chrysikopoulou, S.; Tsoukalas, L.H.

    2015-12-21

    In this study, a cosmic ray muon sampling capability using a phenomenological model that captures the main characteristics of the experimentally measured spectrum coupled with a set of statistical algorithms is developed. The “muon generator” produces muons with zenith angles in the range 0–90° and energies in the range 1–100 GeV and is suitable for Monte Carlo simulations with emphasis on muon tomographic and monitoring applications. The muon energy distribution is described by the Smith and Duller (1959) [35] phenomenological model. Statistical algorithms are then employed for generating random samples. The inverse transform provides a means to generate samples from the muon angular distribution, whereas the Acceptance–Rejection and Metropolis–Hastings algorithms are employed to provide the energy component. The predictions for muon energies 1–60 GeV and zenith angles 0–90° are validated with a series of actual spectrum measurements and with estimates from the software library CRY. The results confirm the validity of the phenomenological model and the applicability of the statistical algorithms to generate polyenergetic–polydirectional muons. The response of the algorithms and the impact of critical parameters on computation time and computed results were investigated. Final output from the proposed “muon generator” is a look-up table that contains the sampled muon angles and energies and can be easily integrated into Monte Carlo particle simulation codes such as Geant4 and MCNP.

  4. Application of a Bayesian algorithm for the Statistical Energy model updating of a railway coach

    DEFF Research Database (Denmark)

    Sadri, Mehran; Brunskog, Jonas; Younesian, Davood

    2016-01-01

    into account based on published data on comparison between experimental and theoretical results, so that the variance of the theory is estimated. The Monte Carlo Metropolis Hastings algorithm is employed to estimate the modified values of the parameters. It is shown that the algorithm can be efficiently used......The classical statistical energy analysis (SEA) theory is a common approach for vibroacoustic analysis of coupled complex structures, being efficient to predict high-frequency noise and vibration of engineering systems. There are however some limitations in applying the conventional SEA...... the performance of the proposed strategy, the SEA model updating of a railway passenger coach is carried out. First, a sensitivity analysis is carried out to select the most sensitive parameters of the SEA model. For the selected parameters of the model, prior probability density functions are then taken...

  5. SU-E-T-21: A Novel Sampling Algorithm to Reduce Intensity-Modulated Radiation Therapy (IMRT) Optimization Time

    International Nuclear Information System (INIS)

    Tiwari, P; Xie, Y; Chen, Y; Deasy, J

    2014-01-01

    Purpose: The IMRT optimization problem requires substantial computer time to find optimal dose distributions because of the large number of variables and constraints. Voxel sampling reduces the number of constraints and accelerates the optimization process, but usually deteriorates the quality of the dose distributions to the organs. We propose a novel sampling algorithm that accelerates the IMRT optimization process without significantly deteriorating the quality of the dose distribution. Methods: We included all boundary voxels, as well as a sampled fraction of interior voxels of organs in the optimization. We selected a fraction of interior voxels using a clustering algorithm, that creates clusters of voxels that have similar influence matrix signatures. A few voxels are selected from each cluster based on the pre-set sampling rate. Results: We ran sampling and no-sampling IMRT plans for de-identified head and neck treatment plans. Testing with the different sampling rates, we found that including 10% of inner voxels produced the good dose distributions. For this optimal sampling rate, the algorithm accelerated IMRT optimization by a factor of 2–3 times with a negligible loss of accuracy that was, on average, 0.3% for common dosimetric planning criteria. Conclusion: We demonstrated that a sampling could be developed that reduces optimization time by more than a factor of 2, without significantly degrading the dose quality

  6. Advanced signal separation and recovery algorithms for digital x-ray spectroscopy

    International Nuclear Information System (INIS)

    Mahmoud, Imbaby I.; El-Tokhy, Mohamed S.

    2015-01-01

    X-ray spectroscopy is widely used for in-situ applications for samples analysis. Therefore, spectrum drawing and assessment of x-ray spectroscopy with high accuracy is the main scope of this paper. A Silicon Lithium Si(Li) detector that cooled with a nitrogen is used for signal extraction. The resolution of the ADC is 12 bits. Also, the sampling rate of ADC is 5 MHz. Hence, different algorithms are implemented. These algorithms were run on a personal computer with Intel core TM i5-3470 CPU and 3.20 GHz. These algorithms are signal preprocessing, signal separation and recovery algorithms, and spectrum drawing algorithm. Moreover, statistical measurements are used for evaluation of these algorithms. Signal preprocessing based on DC-offset correction and signal de-noising is performed. DC-offset correction was done by using minimum value of radiation signal. However, signal de-noising was implemented using fourth order finite impulse response (FIR) filter, linear phase least-square FIR filter, complex wavelet transforms (CWT) and Kalman filter methods. We noticed that Kalman filter achieves large peak signal to noise ratio (PSNR) and lower error than other methods. However, CWT takes much longer execution time. Moreover, three different algorithms that allow correction of x-ray signal overlapping are presented. These algorithms are 1D non-derivative peak search algorithm, second derivative peak search algorithm and extrema algorithm. Additionally, the effect of signal separation and recovery algorithms on spectrum drawing is measured. Comparison between these algorithms is introduced. The obtained results confirm that second derivative peak search algorithm as well as extrema algorithm have very small error in comparison with 1D non-derivative peak search algorithm. However, the second derivative peak search algorithm takes much longer execution time. Therefore, extrema algorithm introduces better results over other algorithms. It has the advantage of recovering and

  7. Polarimetric Segmentation Using Wishart Test Statistic

    DEFF Research Database (Denmark)

    Skriver, Henning; Schou, Jesper; Nielsen, Allan Aasbjerg

    2002-01-01

    A newly developed test statistic for equality of two complex covariance matrices following the complex Wishart distribution and an associated asymptotic probability for the test statistic has been used in a segmentation algorithm. The segmentation algorithm is based on the MUM (merge using moments......) approach, which is a merging algorithm for single channel SAR images. The polarimetric version described in this paper uses the above-mentioned test statistic for merging. The segmentation algorithm has been applied to polarimetric SAR data from the Danish dual-frequency, airborne polarimetric SAR, EMISAR...

  8. External Threat Risk Assessment Algorithm (ExTRAA)

    Energy Technology Data Exchange (ETDEWEB)

    Powell, Troy C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-08-01

    Two risk assessment algorithms and philosophies have been augmented and combined to form a new algorit hm, the External Threat Risk Assessment Algorithm (ExTRAA), that allows for effective and statistically sound analysis of external threat sources in relation to individual attack methods . In addition to the attack method use probability and the attack method employment consequence, t he concept of defining threat sources is added to the risk assessment process. Sample data is tabulated and depicted in radar plots and bar graphs for algorithm demonstration purposes. The largest success of ExTRAA is its ability to visualize the kind of r isk posed in a given situation using the radar plot method.

  9. Theoretical Aspects of the Patterns Recognition Statistical Theory Used for Developing the Diagnosis Algorithms for Complicated Technical Systems

    Science.gov (United States)

    Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.

    2017-01-01

    In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.

  10. Gene coexpression measures in large heterogeneous samples using count statistics.

    Science.gov (United States)

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  11. A maximum-likelihood reconstruction algorithm for tomographic gamma-ray nondestructive assay

    International Nuclear Information System (INIS)

    Prettyman, T.H.; Estep, R.J.; Cole, R.A.; Sheppard, G.A.

    1994-01-01

    A new tomographic reconstruction algorithm for nondestructive assay with high resolution gamma-ray spectroscopy (HRGS) is presented. The reconstruction problem is formulated using a maximum-likelihood approach in which the statistical structure of both the gross and continuum measurements used to determine the full-energy response in HRGS is precisely modeled. An accelerated expectation-maximization algorithm is used to determine the optimal solution. The algorithm is applied to safeguards and environmental assays of large samples (for example, 55-gal. drums) in which high continuum levels caused by Compton scattering are routinely encountered. Details of the implementation of the algorithm and a comparative study of the algorithm's performance are presented

  12. Simulated tempering distributed replica sampling: A practical guide to enhanced conformational sampling

    Energy Technology Data Exchange (ETDEWEB)

    Rauscher, Sarah; Pomes, Regis, E-mail: pomes@sickkids.ca

    2010-11-01

    Simulated tempering distributed replica sampling (STDR) is a generalized-ensemble method designed specifically for simulations of large molecular systems on shared and heterogeneous computing platforms [Rauscher, Neale and Pomes (2009) J. Chem. Theor. Comput. 5, 2640]. The STDR algorithm consists of an alternation of two steps: (1) a short molecular dynamics (MD) simulation; and (2) a stochastic temperature jump. Repeating these steps thousands of times results in a random walk in temperature, which allows the system to overcome energetic barriers, thereby enhancing conformational sampling. The aim of the present paper is to provide a practical guide to applying STDR to complex biomolecular systems. We discuss the details of our STDR implementation, which is a highly-parallel algorithm designed to maximize computational efficiency while simultaneously minimizing network communication and data storage requirements. Using a 35-residue disordered peptide in explicit water as a test system, we characterize the efficiency of the STDR algorithm with respect to both diffusion in temperature space and statistical convergence of structural properties. Importantly, we show that STDR provides a dramatic enhancement of conformational sampling compared to a canonical MD simulation.

  13. Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

    Energy Technology Data Exchange (ETDEWEB)

    Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2013-10-01

    This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).

  14. Inferring microRNA regulation of mRNA with partially ordered samples of paired expression data and exogenous prediction algorithms.

    Directory of Open Access Journals (Sweden)

    Brian Godsey

    Full Text Available MicroRNAs (miRs are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging, there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.

  15. Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions

    Science.gov (United States)

    Chen, Nan; Majda, Andrew J.

    2018-02-01

    Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6

  16. Optimal sampling strategy for data mining

    International Nuclear Information System (INIS)

    Ghaffar, A.; Shahbaz, M.; Mahmood, W.

    2013-01-01

    Latest technology like Internet, corporate intranets, data warehouses, ERP's, satellites, digital sensors, embedded systems, mobiles networks all are generating such a massive amount of data that it is getting very difficult to analyze and understand all these data, even using data mining tools. Huge datasets are becoming a difficult challenge for classification algorithms. With increasing amounts of data, data mining algorithms are getting slower and analysis is getting less interactive. Sampling can be a solution. Using a fraction of computing resources, Sampling can often provide same level of accuracy. The process of sampling requires much care because there are many factors involved in the determination of correct sample size. The approach proposed in this paper tries to find a solution to this problem. Based on a statistical formula, after setting some parameters, it returns a sample size called s ufficient sample size , which is then selected through probability sampling. Results indicate the usefulness of this technique in coping with the problem of huge datasets. (author)

  17. Density meter algorithm and system for estimating sampling/mixing uncertainty

    International Nuclear Information System (INIS)

    Shine, E.P.

    1986-01-01

    The Laboratories Department at the Savannah River Plant (SRP) has installed a six-place density meter with an automatic sampling device. This paper describes the statistical software developed to analyze the density of uranyl nitrate solutions using this automated system. The purpose of this software is twofold: to estimate the sampling/mixing and measurement uncertainties in the process and to provide a measurement control program for the density meter. Non-uniformities in density are analyzed both analytically and graphically. The mean density and its limit of error are estimated. Quality control standards are analyzed concurrently with process samples and used to control the density meter measurement error. The analyses are corrected for concentration due to evaporation of samples waiting to be analyzed. The results of this program have been successful in identifying sampling/mixing problems and controlling the quality of analyses

  18. Some software algorithms for microprocessor ratemeters

    International Nuclear Information System (INIS)

    Savic, Z.

    1991-01-01

    After a review of the basic theoretical ratemeter problem and a general discussion of microprocessor ratemeters, a short insight into their hardware organization is given. Three software algorithms are described: the old ones the quasi-exponential and floating mean algorithm, and a new weighted moving average algorithm. The equations for statistical characterization of the new algorithm are given and an intercomparison is made. It is concluded that the new algorithm has statistical advantages over the old ones. (orig.)

  19. Some software algorithms for microprocessor ratemeters

    Energy Technology Data Exchange (ETDEWEB)

    Savic, Z. (Military Technical Inst., Belgrade (Yugoslavia))

    1991-03-15

    After a review of the basic theoretical ratemeter problem and a general discussion of microprocessor ratemeters, a short insight into their hardware organization is given. Three software algorithms are described: the old ones the quasi-exponential and floating mean algorithm, and a new weighted moving average algorithm. The equations for statistical characterization of the new algorithm are given and an intercomparison is made. It is concluded that the new algorithm has statistical advantages over the old ones. (orig.).

  20. Statistical sampling applied to the radiological characterization of historical waste

    Directory of Open Access Journals (Sweden)

    Zaffora Biagio

    2016-01-01

    Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.

  1. Statistical hadronization and hadronic micro-canonical ensemble II

    International Nuclear Information System (INIS)

    Becattini, F.; Ferroni, L.

    2004-01-01

    We present a Monte Carlo calculation of the micro-canonical ensemble of the ideal hadron-resonance gas including all known states up to a mass of about 1.8 GeV and full quantum statistics. The micro-canonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy, around 8 GeV, thus bearing out previous analyses of hadronic multiplicities in the canonical ensemble. The main numerical computing method is an importance sampling Monte Carlo algorithm using the product of Poisson distributions to generate multi-hadronic channels. It is shown that the use of this multi-Poisson distribution allows for an efficient and fast computation of averages, which can be further improved in the limit of very large clusters. We have also studied the fitness of a previously proposed computing method, based on the Metropolis Monte Carlo algorithm, for event generation in the statistical hadronization model. We find that the use of the multi-Poisson distribution as proposal matrix dramatically improves the computation performance. However, due to the correlation of subsequent samples, this method proves to be generally less robust and effective than the importance sampling method. (orig.)

  2. Experimental uncertainty estimation and statistics for data having interval uncertainty.

    Energy Technology Data Exchange (ETDEWEB)

    Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)

    2007-05-01

    This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.

  3. Robust statistical methods with R

    CERN Document Server

    Jureckova, Jana

    2005-01-01

    Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...

  4. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  5. Study on the effects of sample selection on spectral reflectance reconstruction based on the algorithm of compressive sensing

    International Nuclear Information System (INIS)

    Zhang, Leihong; Liang, Dong

    2016-01-01

    In order to solve the problem that reconstruction efficiency and precision is not high, in this paper different samples are selected to reconstruct spectral reflectance, and a new kind of spectral reflectance reconstruction method based on the algorithm of compressive sensing is provided. Four different color numbers of matte color cards such as the ColorChecker Color Rendition Chart and Color Checker SG, the copperplate paper spot color card of Panton, and the Munsell colors card are chosen as training samples, the spectral image is reconstructed respectively by the algorithm of compressive sensing and pseudo-inverse and Wiener, and the results are compared. These methods of spectral reconstruction are evaluated by root mean square error and color difference accuracy. The experiments show that the cumulative contribution rate and color difference of the Munsell colors card are better than those of the other three numbers of color cards in the same conditions of reconstruction, and the accuracy of the spectral reconstruction will be affected by the training sample of different numbers of color cards. The key technology of reconstruction means that the uniformity and representation of the training sample selection has important significance upon reconstruction. In this paper, the influence of the sample selection on the spectral image reconstruction is studied. The precision of the spectral reconstruction based on the algorithm of compressive sensing is higher than that of the traditional algorithm of spectral reconstruction. By the MATLAB simulation results, it can be seen that the spectral reconstruction precision and efficiency are affected by the different color numbers of the training sample. (paper)

  6. Accurate reconstruction of insertion-deletion histories by statistical phylogenetics.

    Directory of Open Access Journals (Sweden)

    Oscar Westesson

    Full Text Available The Multiple Sequence Alignment (MSA is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history, it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.

  7. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    Science.gov (United States)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described

  8. Statistical sampling plans

    International Nuclear Information System (INIS)

    Jaech, J.L.

    1984-01-01

    In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant

  9. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  10. The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples

    Energy Technology Data Exchange (ETDEWEB)

    Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)

    2017-02-01

    A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).

  11. A prediction algorithm for first onset of major depression in the general population: development and validation.

    Science.gov (United States)

    Wang, JianLi; Sareen, Jitender; Patten, Scott; Bolton, James; Schmitz, Norbert; Birney, Arden

    2014-05-01

    Prediction algorithms are useful for making clinical decisions and for population health planning. However, such prediction algorithms for first onset of major depression do not exist. The objective of this study was to develop and validate a prediction algorithm for first onset of major depression in the general population. Longitudinal study design with approximate 3-year follow-up. The study was based on data from a nationally representative sample of the US general population. A total of 28 059 individuals who participated in Waves 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions and who had not had major depression at Wave 1 were included. The prediction algorithm was developed using logistic regression modelling in 21 813 participants from three census regions. The algorithm was validated in participants from the 4th census region (n=6246). Major depression occurred since Wave 1 of the National Epidemiologic Survey on Alcohol and Related Conditions, assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule-diagnostic and statistical manual for mental disorders IV. A prediction algorithm containing 17 unique risk factors was developed. The algorithm had good discriminative power (C statistics=0.7538, 95% CI 0.7378 to 0.7699) and excellent calibration (F-adjusted test=1.00, p=0.448) with the weighted data. In the validation sample, the algorithm had a C statistic of 0.7259 and excellent calibration (Hosmer-Lemeshow χ(2)=3.41, p=0.906). The developed prediction algorithm has good discrimination and calibration capacity. It can be used by clinicians, mental health policy-makers and service planners and the general public to predict future risk of having major depression. The application of the algorithm may lead to increased personalisation of treatment, better clinical decisions and more optimal mental health service planning.

  12. TRAN-STAT, Issue No. 3, January 1978. Topics discussed: some statistical aspects of compositing field samples

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1978-01-01

    Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics

  13. Evaluation Of Algorithms Of Anti- HIV Antibody Tests

    Directory of Open Access Journals (Sweden)

    Paranjape R.S

    1997-01-01

    Full Text Available Research question: Can alternate algorithms be used in place of conventional algorithm for epidemiological studies of HIV infection with less expenses? Objective: To compare the results of HIV sero- prevalence as determined by test algorithms combining three kits with conventional test algorithm. Study design: Cross â€" sectional. Participants: 282 truck drivers. Statistical analysis: Sensitivity and specificity analysis and predictive values. Results: Three different algorithms that do not include Western Blot (WB were compared with the conventional algorithm, in a truck driver population with 5.6% prevalence of HIV â€"I infection. Algorithms with one EIA (Genetic Systems or Biotest and a rapid test (immunocomb or with two EIAs showed 100% positive predictive value in relation to the conventional algorithm. Using an algorithm with EIA as screening test and a rapid test as a confirmatory test was 50 to 70% less expensive than the conventional algorithm per positive scrum sample. These algorithms obviate the interpretation of indeterminate results and also give differential diagnosis of HIV-2 infection. Alternate algorithms are ideally suited for community based control programme in developing countries. Application of these algorithms in population with low prevalence should also be studied in order to evaluate universal applicability.

  14. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  15. Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals.

    Science.gov (United States)

    Guven, Onur; Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G

    2016-06-01

    This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors' previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp-p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat.

  16. Effects of (α,n) contaminants and sample multiplication on statistical neutron correlation measurements

    International Nuclear Information System (INIS)

    Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.

    1980-01-01

    The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons

  17. STATISTICAL LANDMARKS AND PRACTICAL ISSUES REGARDING THE USE OF SIMPLE RANDOM SAMPLING IN MARKET RESEARCHES

    Directory of Open Access Journals (Sweden)

    CODRUŢA DURA

    2010-01-01

    Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.

  18. PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

    KAUST Repository

    Prudencio, Ernesto; Cheung, Sai Hung

    2012-01-01

    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  19. Angle Statistics Reconstruction: a robust reconstruction algorithm for Muon Scattering Tomography

    Science.gov (United States)

    Stapleton, M.; Burns, J.; Quillin, S.; Steer, C.

    2014-11-01

    Muon Scattering Tomography (MST) is a technique for using the scattering of cosmic ray muons to probe the contents of enclosed volumes. As a muon passes through material it undergoes multiple Coulomb scattering, where the amount of scattering is dependent on the density and atomic number of the material as well as the path length. Hence, MST has been proposed as a means of imaging dense materials, for instance to detect special nuclear material in cargo containers. Algorithms are required to generate an accurate reconstruction of the material density inside the volume from the muon scattering information and some have already been proposed, most notably the Point of Closest Approach (PoCA) and Maximum Likelihood/Expectation Maximisation (MLEM) algorithms. However, whilst PoCA-based algorithms are easy to implement, they perform rather poorly in practice. Conversely, MLEM is a complicated algorithm to implement and computationally intensive and there is currently no published, fast and easily-implementable algorithm that performs well in practice. In this paper, we first provide a detailed analysis of the source of inaccuracy in PoCA-based algorithms. We then motivate an alternative method, based on ideas first laid out by Morris et al, presenting and fully specifying an algorithm that performs well against simulations of realistic scenarios. We argue this new algorithm should be adopted by developers of Muon Scattering Tomography as an alternative to PoCA.

  20. Subclinical delusional ideation and appreciation of sample size and heterogeneity in statistical judgment.

    Science.gov (United States)

    Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G

    2010-11-01

    Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.

  1. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    Science.gov (United States)

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  2. Efficient and exact sampling of simple graphs with given arbitrary degree sequence.

    Directory of Open Access Journals (Sweden)

    Charo I Del Genio

    Full Text Available Uniform sampling from graphical realizations of a given degree sequence is a fundamental component in simulation-based measurements of network observables, with applications ranging from epidemics, through social networks to Internet modeling. Existing graph sampling methods are either link-swap based (Markov-Chain Monte Carlo algorithms or stub-matching based (the Configuration Model. Both types are ill-controlled, with typically unknown mixing times for link-swap methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, polynomial time algorithm that generates statistically independent graph samples with a given, arbitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing the observable to be measured either uniformly over the graph ensemble, or, alternatively, with a desired distribution. Unlike other algorithms, this method always produces a sample, without back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large , and for degree sequences admitting many realizations, the sample weights are expected to have a lognormal distribution. As examples, we apply our algorithm to generate networks with degree sequences drawn from power-law distributions and from binomial distributions.

  3. Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions

    NARCIS (Netherlands)

    Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.

    2013-01-01

    Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of

  4. A robust statistical estimation (RoSE) algorithm jointly recovers the 3D location and intensity of single molecules accurately and precisely

    Science.gov (United States)

    Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.

    2018-02-01

    In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.

  5. A new fast algorithm for the evaluation of regions of interest and statistical uncertainty in computed tomography

    International Nuclear Information System (INIS)

    Huesman, R.H.

    1984-01-01

    A new algorithm for region of interest evaluation in computed tomography is described. Region of interest evaluation is a technique used to improve quantitation of the tomographic imaging process by summing (or averaging) the reconstructed quantity throughout a volume of particular significance. An important application of this procedure arises in the analysis of dynamic emission computed tomographic data, in which the uptake and clearance of radiotracers are used to determine the blood flow and/or physiologica function of tissue within the significant volume. The new algorithm replaces the conventional technique of repeated image reconstructions with one in which projected regions are convolved and then used to form multiple vector inner products with the raw tomographic data sets. Quantitation of regions of interest is made without the need for reconstruction of tomographic images. The computational advantage of the new algorithm over conventional methods is between factors of 20 and of 500 for typical applications encountered in medical science studies. The greatest benefit is the ease with which the statistical uncertainty of the result is computed. The entire covariance matrix for the evaluation of regions of interest can be calculated with relatively few operations. (author)

  6. Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site

    International Nuclear Information System (INIS)

    Harris, S.

    2011-01-01

    Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.

  7. Statistics-based optimization of the polarimetric radar hydrometeor classification algorithm and its application for a squall line in South China

    Science.gov (United States)

    Wu, Chong; Liu, Liping; Wei, Ming; Xi, Baozhu; Yu, Minghui

    2018-03-01

    A modified hydrometeor classification algorithm (HCA) is developed in this study for Chinese polarimetric radars. This algorithm is based on the U.S. operational HCA. Meanwhile, the methodology of statistics-based optimization is proposed including calibration checking, datasets selection, membership functions modification, computation thresholds modification, and effect verification. Zhuhai radar, the first operational polarimetric radar in South China, applies these procedures. The systematic bias of calibration is corrected, the reliability of radar measurements deteriorates when the signal-to-noise ratio is low, and correlation coefficient within the melting layer is usually lower than that of the U.S. WSR-88D radar. Through modification based on statistical analysis of polarimetric variables, the localized HCA especially for Zhuhai is obtained, and it performs well over a one-month test through comparison with sounding and surface observations. The algorithm is then utilized for analysis of a squall line process on 11 May 2014 and is found to provide reasonable details with respect to horizontal and vertical structures, and the HCA results—especially in the mixed rain-hail region—can reflect the life cycle of the squall line. In addition, the kinematic and microphysical processes of cloud evolution and the differences between radar-detected hail and surface observations are also analyzed. The results of this study provide evidence for the improvement of this HCA developed specifically for China.

  8. Embracing equifinality with efficiency: Limits of Acceptability sampling using the DREAM(LOA) algorithm

    Science.gov (United States)

    Vrugt, Jasper A.; Beven, Keith J.

    2018-04-01

    This essay illustrates some recent developments to the DiffeRential Evolution Adaptive Metropolis (DREAM) MATLAB toolbox of Vrugt (2016) to delineate and sample the behavioural solution space of set-theoretic likelihood functions used within the GLUE (Limits of Acceptability) framework (Beven and Binley, 1992, 2014; Beven and Freer, 2001; Beven, 2006). This work builds on the DREAM(ABC) algorithm of Sadegh and Vrugt (2014) and enhances significantly the accuracy and CPU-efficiency of Bayesian inference with GLUE. In particular it is shown how lack of adequate sampling in the model space might lead to unjustified model rejection.

  9. Constructing and sampling directed graphs with given degree sequences

    International Nuclear Information System (INIS)

    Kim, H; Del Genio, C I; Bassler, K E; Toroczkai, Z

    2012-01-01

    The interactions between the components of complex networks are often directed. Proper modeling of such systems frequently requires the construction of ensembles of digraphs with a given sequence of in- and out-degrees. As the number of simple labeled graphs with a given degree sequence is typically very large even for short sequences, sampling methods are needed for statistical studies. Currently, there are two main classes of methods that generate samples. One of the existing methods first generates a restricted class of graphs and then uses a Markov chain Monte-Carlo algorithm based on edge swaps to generate other realizations. As the mixing time of this process is still unknown, the independence of the samples is not well controlled. The other class of methods is based on the configuration model that may lead to unacceptably many sample rejections due to self-loops and multiple edges. Here we present an algorithm that can directly construct all possible realizations of a given bi-degree sequence by simple digraphs. Our method is rejection-free, guarantees the independence of the constructed samples and provides their weight. The weights can then be used to compute statistical averages of network observables as if they were obtained from uniformly distributed sampling or from any other chosen distribution. (paper)

  10. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    Science.gov (United States)

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Use of SAMC for Bayesian analysis of statistical models with intractable normalizing constants

    KAUST Repository

    Jin, Ick Hoon

    2014-03-01

    Statistical inference for the models with intractable normalizing constants has attracted much attention. During the past two decades, various approximation- or simulation-based methods have been proposed for the problem, such as the Monte Carlo maximum likelihood method and the auxiliary variable Markov chain Monte Carlo methods. The Bayesian stochastic approximation Monte Carlo algorithm specifically addresses this problem: It works by sampling from a sequence of approximate distributions with their average converging to the target posterior distribution, where the approximate distributions can be achieved using the stochastic approximation Monte Carlo algorithm. A strong law of large numbers is established for the Bayesian stochastic approximation Monte Carlo estimator under mild conditions. Compared to the Monte Carlo maximum likelihood method, the Bayesian stochastic approximation Monte Carlo algorithm is more robust to the initial guess of model parameters. Compared to the auxiliary variable MCMC methods, the Bayesian stochastic approximation Monte Carlo algorithm avoids the requirement for perfect samples, and thus can be applied to many models for which perfect sampling is not available or very expensive. The Bayesian stochastic approximation Monte Carlo algorithm also provides a general framework for approximate Bayesian analysis. © 2012 Elsevier B.V. All rights reserved.

  12. Trajectory averaging for stochastic approximation MCMC algorithms

    KAUST Repository

    Liang, Faming

    2010-10-01

    The subject of stochastic approximation was founded by Robbins and Monro [Ann. Math. Statist. 22 (1951) 400-407]. After five decades of continual development, it has developed into an important area in systems control and optimization, and it has also served as a prototype for the development of adaptive algorithms for on-line estimation and control of stochastic systems. Recently, it has been used in statistics with Markov chain Monte Carlo for solving maximum likelihood estimation problems and for general simulation and optimizations. In this paper, we first show that the trajectory averaging estimator is asymptotically efficient for the stochastic approximation MCMC (SAMCMC) algorithm under mild conditions, and then apply this result to the stochastic approximation Monte Carlo algorithm [Liang, Liu and Carroll J. Amer. Statist. Assoc. 102 (2007) 305-320]. The application of the trajectory averaging estimator to other stochastic approximationMCMC algorithms, for example, a stochastic approximation MLE algorithm for missing data problems, is also considered in the paper. © Institute of Mathematical Statistics, 2010.

  13. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  14. Detection of cracks in shafts with the Approximated Entropy algorithm

    Science.gov (United States)

    Sampaio, Diego Luchesi; Nicoletti, Rodrigo

    2016-05-01

    The Approximate Entropy is a statistical calculus used primarily in the fields of Medicine, Biology, and Telecommunication for classifying and identifying complex signal data. In this work, an Approximate Entropy algorithm is used to detect cracks in a rotating shaft. The signals of the cracked shaft are obtained from numerical simulations of a de Laval rotor with breathing cracks modelled by the Fracture Mechanics. In this case, one analysed the vertical displacements of the rotor during run-up transients. The results show the feasibility of detecting cracks from 5% depth, irrespective of the unbalance of the rotating system and crack orientation in the shaft. The results also show that the algorithm can differentiate the occurrence of crack only, misalignment only, and crack + misalignment in the system. However, the algorithm is sensitive to intrinsic parameters p (number of data points in a sample vector) and f (fraction of the standard deviation that defines the minimum distance between two sample vectors), and good results are only obtained by appropriately choosing their values according to the sampling rate of the signal.

  15. A cost-saving statistically based screening technique for focused sampling of a lead-contaminated site

    International Nuclear Information System (INIS)

    Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.

    1986-01-01

    High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%

  16. Ecotoxicology statistical sampling

    International Nuclear Information System (INIS)

    Saona, G.

    2012-01-01

    This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.

  17. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    Science.gov (United States)

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  18. Aircraft target detection algorithm based on high resolution spaceborne SAR imagery

    Science.gov (United States)

    Zhang, Hui; Hao, Mengxi; Zhang, Cong; Su, Xiaojing

    2018-03-01

    In this paper, an image classification algorithm for airport area is proposed, which based on the statistical features of synthetic aperture radar (SAR) images and the spatial information of pixels. The algorithm combines Gamma mixture model and MRF. The algorithm using Gamma mixture model to obtain the initial classification result. Pixel space correlation based on the classification results are optimized by the MRF technique. Additionally, morphology methods are employed to extract airport (ROI) region where the suspected aircraft target samples are clarified to reduce the false alarm and increase the detection performance. Finally, this paper presents the plane target detection, which have been verified by simulation test.

  19. Survey of statistical and sampling needs for environmental monitoring of commercial low-level radioactive waste disposal facilities

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Thomas, J.M.

    1986-07-01

    This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions

  20. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

    Directory of Open Access Journals (Sweden)

    Nezar Noor Al-Hebshi

    2015-09-01

    Full Text Available Background: Usefulness of next-generation sequencing (NGS in assessing bacteria associated with oral squamous cell carcinoma (OSCC has been undermined by inability to classify reads to the species level. Objective: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues. Methods: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD, HOMD extended (HOMDEXT, and Greengene Gold (GGG at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis. Results: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples. Conclusion: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the

  1. Generation and Analysis of Constrained Random Sampling Patterns

    DEFF Research Database (Denmark)

    Pierzchlewski, Jacek; Arildsen, Thomas

    2016-01-01

    Random sampling is a technique for signal acquisition which is gaining popularity in practical signal processing systems. Nowadays, event-driven analog-to-digital converters make random sampling feasible in practical applications. A process of random sampling is defined by a sampling pattern, which...... indicates signal sampling points in time. Practical random sampling patterns are constrained by ADC characteristics and application requirements. In this paper, we introduce statistical methods which evaluate random sampling pattern generators with emphasis on practical applications. Furthermore, we propose...... algorithm generates random sampling patterns dedicated for event-driven-ADCs better than existed sampling pattern generators. Finally, implementation issues of random sampling patterns are discussed....

  2. Sensitivity of Marine Warm Cloud Retrieval Statistics to Algorithm Choices: Examples from MODIS Collection 6

    Science.gov (United States)

    Platnick, Steven; Wind, Galina; Zhang, Zhibo; Ackerman, Steven A.; Maddux, Brent

    2012-01-01

    The optical and microphysical structure of warm boundary layer marine clouds is of fundamental importance for understanding a variety of cloud radiation and precipitation processes. With the advent of MODIS (Moderate Resolution Imaging Spectroradiometer) on the NASA EOS Terra and Aqua platforms, simultaneous global/daily 1km retrievals of cloud optical thickness and effective particle size are provided, as well as the derived water path. In addition, the cloud product (MOD06/MYD06 for MODIS Terra and Aqua, respectively) provides separate effective radii results using the l.6, 2.1, and 3.7 m spectral channels. Cloud retrieval statistics are highly sensitive to how a pixel identified as being "notclear" by a cloud mask (e.g., the MOD35/MYD35 product) is determined to be useful for an optical retrieval based on a 1-D cloud model. The Collection 5 MODIS retrieval algorithm removed pixels associated with cloud'edges as well as ocean pixels with partly cloudy elements in the 250m MODIS cloud mask - part of the so-called Clear Sky Restoral (CSR) algorithm. Collection 6 attempts retrievals for those two pixel populations, but allows a user to isolate or filter out the populations via CSR pixel-level Quality Assessment (QA) assignments. In this paper, using the preliminary Collection 6 MOD06 product, we present global and regional statistical results of marine warm cloud retrieval sensitivities to the cloud edge and 250m partly cloudy pixel populations. As expected, retrievals for these pixels are generally consistent with a breakdown of the ID cloud model. While optical thickness for these suspect pixel populations may have some utility for radiative studies, the retrievals should be used with extreme caution for process and microphysical studies.

  3. Algorithm aversion: people erroneously avoid algorithms after seeing them err.

    Science.gov (United States)

    Dietvorst, Berkeley J; Simmons, Joseph P; Massey, Cade

    2015-02-01

    Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.

  4. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  5. ALGORITHM OF PREPARATION OF THE TRAINING SAMPLE USING 3D-FACE MODELING

    Directory of Open Access Journals (Sweden)

    D. I. Samal

    2016-01-01

    Full Text Available The algorithm of preparation and sampling for training of the multiclass qualifier of support vector machines (SVM is provided. The described approach based on the modeling of possible changes of the face features of recognized person. Additional features like perspectives of shooting, conditions of lighting, tilt angles were introduced to get improved identification results. These synthetic generated changes have some impact on the classifier learning expanding the range of possible variations of the initial image. The classifier learned with such extended example is ready to recognize unknown objects better. The age, emotional looks, turns of the head, various conditions of lighting, noise, and also some combinations of the listed parameters are chosen as the key considered parameters for modeling. The third-party software ‘FaceGen’ allowing to model up to 150 parameters and available in a demoversion for free downloading is used for 3D-modeling.The SVM classifier was chosen to test the impact of the introduced modifications of training sample. The preparation and preliminary processing of images contains the following constituents like detection and localization of area of the person on the image, assessment of an angle of rotation and an inclination, extension of the range of brightness of pixels and an equalization of the histogram to smooth the brightness and contrast characteristics of the processed images, scaling of the localized and processed area of the person, creation of a vector of features of the scaled and processed image of the person by a Principal component analysis (algorithm NIPALS, training of the multiclass SVM-classifier.The provided algorithm of expansion of the training selection is oriented to be used in practice and allows to expand using 3D-models the processed range of 2D – photographs of persons that positively affects results of identification in system of face recognition. This approach allows to compensate

  6. Heuristic versus statistical physics approach to optimization problems

    International Nuclear Information System (INIS)

    Jedrzejek, C.; Cieplinski, L.

    1995-01-01

    Optimization is a crucial ingredient of many calculation schemes in science and engineering. In this paper we assess several classes of methods: heuristic algorithms, methods directly relying on statistical physics such as the mean-field method and simulated annealing; and Hopfield-type neural networks and genetic algorithms partly related to statistical physics. We perform the analysis for three types of problems: (1) the Travelling Salesman Problem, (2) vector quantization, and (3) traffic control problem in multistage interconnection network. In general, heuristic algorithms perform better (except for genetic algorithms) and much faster but have to be specific for every problem. The key to improving the performance could be to include heuristic features into general purpose statistical physics methods. (author)

  7. Constrained statistical inference: sample-size tables for ANOVA and regression

    Directory of Open Access Journals (Sweden)

    Leonard eVanbrabant

    2015-01-01

    Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.

  8. Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems

    KAUST Repository

    Ernst, Oliver

    2015-01-07

    We consider Markov Chain Monte Carlo methods adapted to a Hilbert space setting. Such algorithms occur in Bayesian inverse problems where the solution is a probability measure on a function space according to which one would like to integrate or sample. We focus on Metropolis-Hastings algorithms and, in particular, we introduce and analyze a generalization of the existing pCN-proposal. This new proposal allows to exploit the geometry or anisotropy of the target measure which in turn might improve the statistical efficiency of the corresponding MCMC method. Numerical experiments for a real-world problem confirm the improvement.

  9. Metropolis-Hastings Algorithms in Function Space for Bayesian Inverse Problems

    KAUST Repository

    Ernst, Oliver

    2015-01-01

    We consider Markov Chain Monte Carlo methods adapted to a Hilbert space setting. Such algorithms occur in Bayesian inverse problems where the solution is a probability measure on a function space according to which one would like to integrate or sample. We focus on Metropolis-Hastings algorithms and, in particular, we introduce and analyze a generalization of the existing pCN-proposal. This new proposal allows to exploit the geometry or anisotropy of the target measure which in turn might improve the statistical efficiency of the corresponding MCMC method. Numerical experiments for a real-world problem confirm the improvement.

  10. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    Science.gov (United States)

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  11. Accelerating statistical image reconstruction algorithms for fan-beam x-ray CT using cloud computing

    Science.gov (United States)

    Srivastava, Somesh; Rao, A. Ravishankar; Sheinin, Vadim

    2011-03-01

    Statistical image reconstruction algorithms potentially offer many advantages to x-ray computed tomography (CT), e.g. lower radiation dose. But, their adoption in practical CT scanners requires extra computation power, which is traditionally provided by incorporating additional computing hardware (e.g. CPU-clusters, GPUs, FPGAs etc.) into a scanner. An alternative solution is to access the required computation power over the internet from a cloud computing service, which is orders-of-magnitude more cost-effective. This is because users only pay a small pay-as-you-go fee for the computation resources used (i.e. CPU time, storage etc.), and completely avoid purchase, maintenance and upgrade costs. In this paper, we investigate the benefits and shortcomings of using cloud computing for statistical image reconstruction. We parallelized the most time-consuming parts of our application, the forward and back projectors, using MapReduce, the standard parallelization library on clouds. From preliminary investigations, we found that a large speedup is possible at a very low cost. But, communication overheads inside MapReduce can limit the maximum speedup, and a better MapReduce implementation might become necessary in the future. All the experiments for this paper, including development and testing, were completed on the Amazon Elastic Compute Cloud (EC2) for less than $20.

  12. An Energy Aware Adaptive Sampling Algorithm for Energy Harvesting WSN with Energy Hungry Sensors

    Science.gov (United States)

    Srbinovski, Bruno; Magno, Michele; Edwards-Murphy, Fiona; Pakrashi, Vikram; Popovici, Emanuel

    2016-01-01

    Wireless sensor nodes have a limited power budget, though they are often expected to be functional in the field once deployed for extended periods of time. Therefore, minimization of energy consumption and energy harvesting technology in Wireless Sensor Networks (WSN) are key tools for maximizing network lifetime, and achieving self-sustainability. This paper proposes an energy aware Adaptive Sampling Algorithm (ASA) for WSN with power hungry sensors and harvesting capabilities, an energy management technique that can be implemented on any WSN platform with enough processing power to execute the proposed algorithm. An existing state-of-the-art ASA developed for wireless sensor networks with power hungry sensors is optimized and enhanced to adapt the sampling frequency according to the available energy of the node. The proposed algorithm is evaluated using two in-field testbeds that are supplied by two different energy harvesting sources (solar and wind). Simulation and comparison between the state-of-the-art ASA and the proposed energy aware ASA (EASA) in terms of energy durability are carried out using in-field measured harvested energy (using both wind and solar sources) and power hungry sensors (ultrasonic wind sensor and gas sensors). The simulation results demonstrate that using ASA in combination with an energy aware function on the nodes can drastically increase the lifetime of a WSN node and enable self-sustainability. In fact, the proposed EASA in conjunction with energy harvesting capability can lead towards perpetual WSN operation and significantly outperform the state-of-the-art ASA. PMID:27043559

  13. An Energy Aware Adaptive Sampling Algorithm for Energy Harvesting WSN with Energy Hungry Sensors

    Directory of Open Access Journals (Sweden)

    Bruno Srbinovski

    2016-03-01

    Full Text Available Wireless sensor nodes have a limited power budget, though they are often expected to be functional in the field once deployed for extended periods of time. Therefore, minimization of energy consumption and energy harvesting technology in Wireless Sensor Networks (WSN are key tools for maximizing network lifetime, and achieving self-sustainability. This paper proposes an energy aware Adaptive Sampling Algorithm (ASA for WSN with power hungry sensors and harvesting capabilities, an energy management technique that can be implemented on any WSN platform with enough processing power to execute the proposed algorithm. An existing state-of-the-art ASA developed for wireless sensor networks with power hungry sensors is optimized and enhanced to adapt the sampling frequency according to the available energy of the node. The proposed algorithm is evaluated using two in-field testbeds that are supplied by two different energy harvesting sources (solar and wind. Simulation and comparison between the state-of-the-art ASA and the proposed energy aware ASA (EASA in terms of energy durability are carried out using in-field measured harvested energy (using both wind and solar sources and power hungry sensors (ultrasonic wind sensor and gas sensors. The simulation results demonstrate that using ASA in combination with an energy aware function on the nodes can drastically increase the lifetime of a WSN node and enable self-sustainability. In fact, the proposed EASA in conjunction with energy harvesting capability can lead towards perpetual WSN operation and significantly outperform the state-of-the-art ASA.

  14. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Application

    KAUST Repository

    Chambolle, Antonin; Ehrhardt, Matthias J.; Richtarik, Peter; Schö nlieb, Carola-Bibiane

    2017-01-01

    We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

  15. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Application

    KAUST Repository

    Chambolle, Antonin

    2017-06-15

    We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

  16. A continuation multilevel Monte Carlo algorithm

    KAUST Repository

    Collier, Nathan

    2014-09-05

    We propose a novel Continuation Multi Level Monte Carlo (CMLMC) algorithm for weak approximation of stochastic models. The CMLMC algorithm solves the given approximation problem for a sequence of decreasing tolerances, ending when the required error tolerance is satisfied. CMLMC assumes discretization hierarchies that are defined a priori for each level and are geometrically refined across levels. The actual choice of computational work across levels is based on parametric models for the average cost per sample and the corresponding variance and weak error. These parameters are calibrated using Bayesian estimation, taking particular notice of the deepest levels of the discretization hierarchy, where only few realizations are available to produce the estimates. The resulting CMLMC estimator exhibits a non-trivial splitting between bias and statistical contributions. We also show the asymptotic normality of the statistical error in the MLMC estimator and justify in this way our error estimate that allows prescribing both required accuracy and confidence in the final result. Numerical results substantiate the above results and illustrate the corresponding computational savings in examples that are described in terms of differential equations either driven by random measures or with random coefficients. © 2014, Springer Science+Business Media Dordrecht.

  17. Supporting Students to Develop Concepts Underlying Sampling and to Shuttle Between Contextual and Statistical Spheres

    NARCIS (Netherlands)

    Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.

    2012-01-01

    To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been

  18. Statistical reconstruction for cosmic ray muon tomography.

    Science.gov (United States)

    Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

    2007-08-01

    Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.

  19. Statistical issues in reporting quality data: small samples and casemix variation.

    Science.gov (United States)

    Zaslavsky, A M

    2001-12-01

    To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.

  20. Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples

    International Nuclear Information System (INIS)

    Welsh, T.L.

    1994-01-01

    Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value

  1. Compressing an Ensemble with Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano; Genton, Marc G.

    2015-01-01

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific data sets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set of one billion data points with a covariance matrix comprising of 10^18 entries.

  2. Compressing an Ensemble with Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano

    2015-04-02

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific data sets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set of one billion data points with a covariance matrix comprising of 10^18 entries.

  3. Discrimination of handlebar grip samples by fourier transform infrared microspectroscopy analysis and statistics

    Directory of Open Access Journals (Sweden)

    Zeyu Lin

    2017-01-01

    Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.

  4. Criteria and algorithms for constructing reliable databases for statistical analysis of disruptions at ASDEX Upgrade

    International Nuclear Information System (INIS)

    Cannas, B.; Fanni, A.; Pautasso, G.; Sias, G.; Sonato, P.

    2009-01-01

    The present understanding of disruption physics has not gone so far as to provide a mathematical model describing the onset of this instability. A disruption prediction system, based on a statistical analysis of the diagnostic signals recorded during the experiments, would allow estimating the probability of a disruption to take place. A crucial point for a good design of such a prediction system is the appropriateness of the data set. This paper reports the details of the database built to train a disruption predictor based on neural networks for ASDEX Upgrade. The criteria of pulses selection, the analyses performed on plasma parameters and the implemented pre-processing algorithms, are described. As an example of application, a short description of the disruption predictor is reported.

  5. Principal component analysis networks and algorithms

    CERN Document Server

    Kong, Xiangyu; Duan, Zhansheng

    2017-01-01

    This book not only provides a comprehensive introduction to neural-based PCA methods in control science, but also presents many novel PCA algorithms and their extensions and generalizations, e.g., dual purpose, coupled PCA, GED, neural based SVD algorithms, etc. It also discusses in detail various analysis methods for the convergence, stabilizing, self-stabilizing property of algorithms, and introduces the deterministic discrete-time systems method to analyze the convergence of PCA/MCA algorithms. Readers should be familiar with numerical analysis and the fundamentals of statistics, such as the basics of least squares and stochastic algorithms. Although it focuses on neural networks, the book only presents their learning law, which is simply an iterative algorithm. Therefore, no a priori knowledge of neural networks is required. This book will be of interest and serve as a reference source to researchers and students in applied mathematics, statistics, engineering, and other related fields.

  6. Algorithms for detecting and analysing autocatalytic sets.

    Science.gov (United States)

    Hordijk, Wim; Smith, Joshua I; Steel, Mike

    2015-01-01

    Autocatalytic sets are considered to be fundamental to the origin of life. Prior theoretical and computational work on the existence and properties of these sets has relied on a fast algorithm for detectingself-sustaining autocatalytic sets in chemical reaction systems. Here, we introduce and apply a modified version and several extensions of the basic algorithm: (i) a modification aimed at reducing the number of calls to the computationally most expensive part of the algorithm, (ii) the application of a previously introduced extension of the basic algorithm to sample the smallest possible autocatalytic sets within a reaction network, and the application of a statistical test which provides a probable lower bound on the number of such smallest sets, (iii) the introduction and application of another extension of the basic algorithm to detect autocatalytic sets in a reaction system where molecules can also inhibit (as well as catalyse) reactions, (iv) a further, more abstract, extension of the theory behind searching for autocatalytic sets. (i) The modified algorithm outperforms the original one in the number of calls to the computationally most expensive procedure, which, in some cases also leads to a significant improvement in overall running time, (ii) our statistical test provides strong support for the existence of very large numbers (even millions) of minimal autocatalytic sets in a well-studied polymer model, where these minimal sets share about half of their reactions on average, (iii) "uninhibited" autocatalytic sets can be found in reaction systems that allow inhibition, but their number and sizes depend on the level of inhibition relative to the level of catalysis. (i) Improvements in the overall running time when searching for autocatalytic sets can potentially be obtained by using a modified version of the algorithm, (ii) the existence of large numbers of minimal autocatalytic sets can have important consequences for the possible evolvability of

  7. New Optimization Algorithms in Physics

    CERN Document Server

    Hartmann, Alexander K

    2004-01-01

    Many physicists are not aware of the fact that they can solve their problems by applying optimization algorithms. Since the number of such algorithms is steadily increasing, many new algorithms have not been presented comprehensively until now. This presentation of recently developed algorithms applied in physics, including demonstrations of how they work and related results, aims to encourage their application, and as such the algorithms selected cover concepts and methods from statistical physics to optimization problems emerging in theoretical computer science.

  8. Effects of deformable registration algorithms on the creation of statistical maps for preoperative targeting in deep brain stimulation procedures

    Science.gov (United States)

    Liu, Yuan; D'Haese, Pierre-Francois; Dawant, Benoit M.

    2014-03-01

    Deep brain stimulation, which is used to treat various neurological disorders, involves implanting a permanent electrode into precise targets deep in the brain. Accurate pre-operative localization of the targets on pre-operative MRI sequence is challenging as these are typically located in homogenous regions with poor contrast. Population-based statistical atlases can assist with this process. Such atlases are created by acquiring the location of efficacious regions from numerous subjects and projecting them onto a common reference image volume using some normalization method. In previous work, we presented results concluding that non-rigid registration provided the best result for such normalization. However, this process could be biased by the choice of the reference image and/or registration approach. In this paper, we have qualitatively and quantitatively compared the performance of six recognized deformable registration methods at normalizing such data in poor contrasted regions onto three different reference volumes using a unique set of data from 100 patients. We study various metrics designed to measure the centroid, spread, and shape of the normalized data. This study leads to a total of 1800 deformable registrations and results show that statistical atlases constructed using different deformable registration methods share comparable centroids and spreads with marginal differences in their shape. Among the six methods being studied, Diffeomorphic Demons produces the largest spreads and centroids that are the furthest apart from the others in general. Among the three atlases, one atlas consistently outperforms the other two with smaller spreads for each algorithm. However, none of the differences in the spreads were found to be statistically significant, across different algorithms or across different atlases.

  9. Enhanced Map-Matching Algorithm with a Hidden Markov Model for Mobile Phone Positioning

    Directory of Open Access Journals (Sweden)

    An Luo

    2017-10-01

    Full Text Available Numerous map-matching techniques have been developed to improve positioning, using Global Positioning System (GPS data and other sensors. However, most existing map-matching algorithms process GPS data with high sampling rates, to achieve a higher correct rate and strong universality. This paper introduces a novel map-matching algorithm based on a hidden Markov model (HMM for GPS positioning and mobile phone positioning with a low sampling rate. The HMM is a statistical model well known for providing solutions to temporal recognition applications such as text and speech recognition. In this work, the hidden Markov chain model was built to establish a map-matching process, using the geometric data, the topologies matrix of road links in road network and refined quad-tree data structure. HMM-based map-matching exploits the Viterbi algorithm to find the optimized road link sequence. The sequence consists of hidden states in the HMM model. The HMM-based map-matching algorithm is validated on a vehicle trajectory using GPS and mobile phone data. The results show a significant improvement in mobile phone positioning and high and low sampling of GPS data.

  10. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    International Nuclear Information System (INIS)

    Harris, S.P.

    1997-01-01

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by

  11. Stochastic simulation algorithms and analysis

    CERN Document Server

    Asmussen, Soren

    2007-01-01

    Sampling-based computational methods have become a fundamental part of the numerical toolset of practitioners and researchers across an enormous number of different applied domains and academic disciplines. This book provides a broad treatment of such sampling-based methods, as well as accompanying mathematical analysis of the convergence properties of the methods discussed. The reach of the ideas is illustrated by discussing a wide range of applications and the models that have found wide usage. The first half of the book focuses on general methods, whereas the second half discusses model-specific algorithms. Given the wide range of examples, exercises and applications students, practitioners and researchers in probability, statistics, operations research, economics, finance, engineering as well as biology and chemistry and physics will find the book of value.

  12. The effect of Scratch environment on student’s achievement in teaching algorithm

    Directory of Open Access Journals (Sweden)

    Mehmet Tekerek

    2014-08-01

    Full Text Available In this study, the effect of Scratch environment in teaching algorithm in elementary school 6th grade Information and Communication Technologies course was examined. The research method was experimental method. Control group, pretest-posttest design of experimental research method and a convenience sample consisting of 60 6th grade students were used. The research instrument was achievement test to determine the effect of Scratch on learning algorithm. During the implementation process experiment group studied using Scratch and control group studied with traditional methods. The data was analyzed using independent-samples t-test, paired-samples t-test and ANCOVA statistics. According to findings there is no statically significant difference between posttest achievement scores of experiment and control groups. Similarly, In terms of gender there isn’t a statically significant difference between posttest scores of experiment and control groups.

  13. A Monte Carlo algorithm for sampling rare events: application to a search for the Griffiths singularity

    International Nuclear Information System (INIS)

    Hukushima, K; Iba, Y

    2008-01-01

    We develop a recently proposed importance-sampling Monte Carlo algorithm for sampling rare events and quenched variables in random disordered systems. We apply it to a two dimensional bond-diluted Ising model and study the Griffiths singularity which is considered to be due to the existence of rare large clusters. It is found that the distribution of the inverse susceptibility has an exponential tail down to the origin which is considered the consequence of the Griffiths singularity

  14. Finite sample performance of the E-M algorithm for ranks data modelling

    Directory of Open Access Journals (Sweden)

    Angela D'Elia

    2007-10-01

    Full Text Available We check the finite sample performance of the maximum likelihood estimators of the parameters of a mixture distribution recently introduced for modelling ranks/preference data. The estimates are derived by the E-M algorithm and the performance is evaluated both from an univariate and bivariate points of view. While the results are generally acceptable as far as it concerns the bias, the Monte Carlo experiment shows a different behaviour of the estimators efficiency for the two parameters of the mixture, mainly depending upon their location in the admissible parametric space. Some operative suggestions conclude the paer.

  15. Statistical sampling plan for the TRU waste assay facility

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.

    1983-08-01

    Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables

  16. Statistics and sampling in transuranic studies

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Gilbert, R.O.

    1980-01-01

    The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas

  17. Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

    Directory of Open Access Journals (Sweden)

    James Robert White

    2009-04-01

    Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software

  18. Results of Evolution Supervised by Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lorentz JÄNTSCHI

    2010-09-01

    Full Text Available The efficiency of a genetic algorithm is frequently assessed using a series of operators of evolution like crossover operators, mutation operators or other dynamic parameters. The present paper aimed to review the main results of evolution supervised by genetic algorithms used to identify solutions to agricultural and horticultural hard problems and to discuss the results of using a genetic algorithms on structure-activity relationships in terms of behavior of evolution supervised by genetic algorithms. A genetic algorithm had been developed and implemented in order to identify the optimal solution in term of estimation power of a multiple linear regression approach for structure-activity relationships. Three survival and three selection strategies (proportional, deterministic and tournament were investigated in order to identify the best survival-selection strategy able to lead to the model with higher estimation power. The Molecular Descriptors Family for structure characterization of a sample of 206 polychlorinated biphenyls with measured octanol-water partition coefficients was used as case study. Evolution using different selection and survival strategies proved to create populations of genotypes living in the evolution space with different diversity and variability. Under a series of criteria of comparisons these populations proved to be grouped and the groups were showed to be statistically different one to each other. The conclusions about genetic algorithm evolution according to a number of criteria were also highlighted.

  19. Statistical data processing with automatic system for environmental radiation monitoring

    International Nuclear Information System (INIS)

    Zarkh, V.G.; Ostroglyadov, S.V.

    1986-01-01

    Practice of statistical data processing for radiation monitoring is exemplified, and some results obtained are presented. Experience in practical application of mathematical statistics methods for radiation monitoring data processing allowed to develop a concrete algorithm of statistical processing realized in M-6000 minicomputer. The suggested algorithm by its content is divided into 3 parts: parametrical data processing and hypotheses test, pair and multiple correlation analysis. Statistical processing programms are in a dialogue operation. The above algorithm was used to process observed data over radioactive waste disposal control region. Results of surface waters monitoring processing are presented

  20. Determination of Optimal Initial Weights of an Artificial Neural Network by Using the Harmony Search Algorithm: Application to Breakwater Armor Stones

    Directory of Open Access Journals (Sweden)

    Anzy Lee

    2016-05-01

    Full Text Available In this study, an artificial neural network (ANN model is developed to predict the stability number of breakwater armor stones based on the experimental data reported by Van der Meer in 1988. The harmony search (HS algorithm is used to determine the near-global optimal initial weights in the training of the model. The stratified sampling is used to sample the training data. A total of 25 HS-ANN hybrid models are tested with different combinations of HS algorithm parameters. The HS-ANN models are compared with the conventional ANN model, which uses a Monte Carlo simulation to determine the initial weights. Each model is run 50 times and the statistical analyses are conducted for the model results. The present models using stratified sampling are shown to be more accurate than those of previous studies. The statistical analyses for the model results show that the HS-ANN model with proper values of HS algorithm parameters can give much better and more stable prediction than the conventional ANN model.

  1. Statistical assessment of fish behavior from split-beam hydro-acoustic sampling

    International Nuclear Information System (INIS)

    McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.

    2005-01-01

    Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior

  2. Effects of Sample Size and Dimensionality on the Performance of Four Algorithms for Inference of Association Networks in Metabonomics

    NARCIS (Netherlands)

    Suarez Diez, M.; Saccenti, E.

    2015-01-01

    We investigated the effect of sample size and dimensionality on the performance of four algorithms (ARACNE, CLR, CORR, and PCLRC) when they are used for the inference of metabolite association networks. We report that as many as 100-400 samples may be necessary to obtain stable network estimations,

  3. Statistical Engine Knock Control

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2008-01-01

    A new statistical concept of the knock control of a spark ignition automotive engine is proposed . The control aim is associated with the statistical hy pothesis test which compares the threshold value to the average value of the max imal amplitud e of the knock sensor signal at a given freq uency....... C ontrol algorithm which is used for minimization of the regulation error realizes a simple count-up-count-d own logic. A new ad aptation algorithm for the knock d etection threshold is also d eveloped . C onfi d ence interval method is used as the b asis for ad aptation. A simple statistical mod el...... which includ es generation of the amplitud e signals, a threshold value d etermination and a knock sound mod el is d eveloped for evaluation of the control concept....

  4. THE APPROACHING TRAIN DETECTION ALGORITHM

    OpenAIRE

    S. V. Bibikov

    2015-01-01

    The paper deals with detection algorithm for rail vibroacoustic waves caused by approaching train on the background of increased noise. The urgency of algorithm development for train detection in view of increased rail noise, when railway lines are close to roads or road intersections is justified. The algorithm is based on the method of weak signals detection in a noisy environment. The information statistics ultimate expression is adjusted. We present the results of algorithm research and t...

  5. A Probabilistic Mass Estimation Algorithm for a Novel 7- Channel Capacitive Sample Verification Sensor

    Science.gov (United States)

    Wolf, Michael

    2012-01-01

    A document describes an algorithm created to estimate the mass placed on a sample verification sensor (SVS) designed for lunar or planetary robotic sample return missions. A novel SVS measures the capacitance between a rigid bottom plate and an elastic top membrane in seven locations. As additional sample material (soil and/or small rocks) is placed on the top membrane, the deformation of the membrane increases the capacitance. The mass estimation algorithm addresses both the calibration of each SVS channel, and also addresses how to combine the capacitances read from each of the seven channels into a single mass estimate. The probabilistic approach combines the channels according to the variance observed during the training phase, and provides not only the mass estimate, but also a value for the certainty of the estimate. SVS capacitance data is collected for known masses under a wide variety of possible loading scenarios, though in all cases, the distribution of sample within the canister is expected to be approximately uniform. A capacitance-vs-mass curve is fitted to this data, and is subsequently used to determine the mass estimate for the single channel s capacitance reading during the measurement phase. This results in seven different mass estimates, one for each SVS channel. Moreover, the variance of the calibration data is used to place a Gaussian probability distribution function (pdf) around this mass estimate. To blend these seven estimates, the seven pdfs are combined into a single Gaussian distribution function, providing the final mean and variance of the estimate. This blending technique essentially takes the final estimate as an average of the estimates of the seven channels, weighted by the inverse of the channel s variance.

  6. Visual Sample Plan Version 7.0 User's Guide

    Energy Technology Data Exchange (ETDEWEB)

    Matzke, Brett D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Newburn, Lisa LN [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hathaway, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bramer, Lisa M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Wilson, John E. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Dowson, Scott T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Sego, Landon H. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Pulsipher, Brent A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2014-03-01

    User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination. The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.

  7. Evolutionary Statistical Procedures

    CERN Document Server

    Baragona, Roberto; Poli, Irene

    2011-01-01

    This proposed text appears to be a good introduction to evolutionary computation for use in applied statistics research. The authors draw from a vast base of knowledge about the current literature in both the design of evolutionary algorithms and statistical techniques. Modern statistical research is on the threshold of solving increasingly complex problems in high dimensions, and the generalization of its methodology to parameters whose estimators do not follow mathematically simple distributions is underway. Many of these challenges involve optimizing functions for which analytic solutions a

  8. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    Science.gov (United States)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  9. Texture classification by texton: statistical versus binary.

    Directory of Open Access Journals (Sweden)

    Zhenhua Guo

    Full Text Available Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8, image patch (Statistical_Joint and locally invariant fractal (Statistical_Fractal are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor.

  10. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  11. Entropic sampling of simple polymer models within Wang-Landau algorithm

    International Nuclear Information System (INIS)

    Vorontsov-Velyaminov, P N; Volkov, N A; Yurchenko, A A

    2004-01-01

    In this paper we apply a new simulation technique proposed in Wang and Landau (WL) (2001 Phys. Rev. Lett. 86 2050) to sampling of three-dimensional lattice and continuous models of polymer chains. Distributions obtained by homogeneous (unconditional) random walks are compared with results of entropic sampling (ES) within the WL algorithm. While homogeneous sampling gives reliable results typically in the range of 4-5 orders of magnitude, the WL entropic sampling yields them in the range of 20-30 orders and even larger with comparable computer effort. A combination of homogeneous and WL sampling provides reliable data for events with probabilities down to 10 -35 . For the lattice model we consider both the athermal case (self-avoiding walks, SAWs) and the thermal case when an energy is attributed to each contact between nonbonded monomers in a self-avoiding walk. For short chains the simulation results are checked by comparison with the exact data. In WL calculations for chain lengths up to N = 300 scaling relations for SAWs are well reproduced. In the thermal case distribution over the number of contacts is obtained in the N-range up to N = 100 and the canonical averages - internal energy, heat capacity, excess canonical entropy, mean square end-to-end distance - are calculated as a result in a wide temperature range. The continuous model is studied in the athermal case. By sorting conformations of a continuous phantom freely joined N-bonded chain with a unit bond length over a stochastic variable, the minimum distance between nonbonded beads, we determine the probability distribution for the N-bonded chain with hard sphere monomer units over its diameter a in the complete diameter range, 0 ≤ a ≤ 2, within a single ES run. This distribution provides us with excess specific entropy for a set of diameters a in this range. Calculations were made for chain lengths up to N = 100 and results were extrapolated to N → ∞ for a in the range 0 ≤ a ≤ 1.25

  12. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

    1997-09-18

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock

  13. The tradition algorithm approach underestimates the prevalence of serodiagnosis of syphilis in HIV-infected individuals.

    Science.gov (United States)

    Chen, Bin; Peng, Xiuming; Xie, Tiansheng; Jin, Changzhong; Liu, Fumin; Wu, Nanping

    2017-07-01

    Currently, there are three algorithms for screening of syphilis: traditional algorithm, reverse algorithm and European Centre for Disease Prevention and Control (ECDC) algorithm. To date, there is not a generally recognized diagnostic algorithm. When syphilis meets HIV, the situation is even more complex. To evaluate their screening performance and impact on the seroprevalence of syphilis in HIV-infected individuals, we conducted a cross-sectional study included 865 serum samples from HIV-infected patients in a tertiary hospital. Every sample (one per patient) was tested with toluidine red unheated serum test (TRUST), T. pallidum particle agglutination assay (TPPA), and Treponema pallidum enzyme immunoassay (TP-EIA) according to the manufacturer's instructions. The results of syphilis serological testing were interpreted following different algorithms respectively. We directly compared the traditional syphilis screening algorithm with the reverse syphilis screening algorithm in this unique population. The reverse algorithm achieved remarkable higher seroprevalence of syphilis than the traditional algorithm (24.9% vs. 14.2%, p algorithm, the traditional algorithm also had a missed serodiagnosis rate of 42.8%. The total percentages of agreement and corresponding kappa values of tradition and ECDC algorithm compared with those of reverse algorithm were as follows: 89.4%,0.668; 99.8%, 0.994. There was a very good strength of agreement between the reverse and the ECDC algorithm. Our results supported the reverse (or ECDC) algorithm in screening of syphilis in HIV-infected populations. In addition, our study demonstrated that screening of HIV-populations using different algorithms may result in a statistically different seroprevalence of syphilis.

  14. A Note on Information-Directed Sampling and Thompson Sampling

    OpenAIRE

    Zhou, Li

    2015-01-01

    This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers.

  15. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Science.gov (United States)

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical

  16. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Directory of Open Access Journals (Sweden)

    David C Pavlacky

    Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous

  17. New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics

    International Nuclear Information System (INIS)

    Akhmatskaya, Elena; Reich, Sebastian

    2011-01-01

    We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)

  18. Extending statistical boosting. An overview of recent methodological developments.

    Science.gov (United States)

    Mayr, A; Binder, H; Gefeller, O; Schmid, M

    2014-01-01

    Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade. This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research. We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now. The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings. Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.

  19. From inverse problems to learning: a Statistical Mechanics approach

    Science.gov (United States)

    Baldassi, Carlo; Gerace, Federica; Saglietti, Luca; Zecchina, Riccardo

    2018-01-01

    We present a brief introduction to the statistical mechanics approaches for the study of inverse problems in data science. We then provide concrete new results on inferring couplings from sampled configurations in systems characterized by an extensive number of stable attractors in the low temperature regime. We also show how these result are connected to the problem of learning with realistic weak signals in computational neuroscience. Our techniques and algorithms rely on advanced mean-field methods developed in the context of disordered systems.

  20. Chinese handwriting recognition an algorithmic perspective

    CERN Document Server

    Su, Tonghua

    2013-01-01

    This book provides an algorithmic perspective on the recent development of Chinese handwriting recognition. Two technically sound strategies, the segmentation-free and integrated segmentation-recognition strategy, are investigated and algorithms that have worked well in practice are primarily focused on. Baseline systems are initially presented for these strategies and are subsequently expanded on and incrementally improved. The sophisticated algorithms covered include: 1) string sample expansion algorithms which synthesize string samples from isolated characters or distort realistic string samples; 2) enhanced feature representation algorithms, e.g. enhanced four-plane features and Delta features; 3) novel learning algorithms, such as Perceptron learning with dynamic margin, MPE training and distributed training; and lastly 4) ensemble algorithms, that is, combining the two strategies using both parallel structure and serial structure. All the while, the book moves from basic to advanced algorithms, helping ...

  1. GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms.

    Science.gov (United States)

    Kobayashi, Chigusa; Jung, Jaewoon; Matsunaga, Yasuhiro; Mori, Takaharu; Ando, Tadashi; Tamura, Koichi; Kamiya, Motoshi; Sugita, Yuji

    2017-09-30

    GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  2. Practical continuous-variable quantum key distribution without finite sampling bandwidth effects.

    Science.gov (United States)

    Li, Huasheng; Wang, Chao; Huang, Peng; Huang, Duan; Wang, Tao; Zeng, Guihua

    2016-09-05

    In a practical continuous-variable quantum key distribution system, finite sampling bandwidth of the employed analog-to-digital converter at the receiver's side may lead to inaccurate results of pulse peak sampling. Then, errors in the parameters estimation resulted. Subsequently, the system performance decreases and security loopholes are exposed to eavesdroppers. In this paper, we propose a novel data acquisition scheme which consists of two parts, i.e., a dynamic delay adjusting module and a statistical power feedback-control algorithm. The proposed scheme may improve dramatically the data acquisition precision of pulse peak sampling and remove the finite sampling bandwidth effects. Moreover, the optimal peak sampling position of a pulse signal can be dynamically calibrated through monitoring the change of the statistical power of the sampled data in the proposed scheme. This helps to resist against some practical attacks, such as the well-known local oscillator calibration attack.

  3. Sampling Transition Pathways in Highly Correlated Complex Systems

    Energy Technology Data Exchange (ETDEWEB)

    Chandler, David

    2004-10-20

    This research grant supported my group's efforts to apply and extend the method of transition path sampling that we invented during the late 1990s. This methodology is based upon a statistical mechanics of trajectory space. Traditional statistical mechanics focuses on state space, and with it, one can use Monte Carlo methods to facilitate importance sampling of states. With our formulation of a statistical mechanics of trajectory space, we have succeeded at creating algorithms by which importance sampling can be done for dynamical processes. In particular, we are able to study rare but important events without prior knowledge of transition states or mechanisms. In perhaps the most impressive application of transition path sampling, my group combined forces with Michele Parrinello and his coworkers to unravel the dynamics of auto ionization of water [5]. This dynamics is the fundamental kinetic step of pH. Other applications concern nature of dynamics far from equilibrium [1, 7], nucleation processes [2], cluster isomerization, melting and dissociation [3, 6], and molecular motors [10]. Research groups throughout the world are adopting transition path sampling. In part this has been the result of our efforts to provide pedagogical presentations of the technique [4, 8, 9], as well as providing new procedures for interpreting trajectories of complex systems [11].

  4. Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

    Energy Technology Data Exchange (ETDEWEB)

    Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

    2007-10-24

    Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples

  5. Generalized-ensemble molecular dynamics and Monte Carlo algorithms beyond the limit of the multicanonical algorithm

    International Nuclear Information System (INIS)

    Okumura, Hisashi

    2010-01-01

    I review two new generalized-ensemble algorithms for molecular dynamics and Monte Carlo simulations of biomolecules, that is, the multibaric–multithermal algorithm and the partial multicanonical algorithm. In the multibaric–multithermal algorithm, two-dimensional random walks not only in the potential-energy space but also in the volume space are realized. One can discuss the temperature dependence and pressure dependence of biomolecules with this algorithm. The partial multicanonical simulation samples a wide range of only an important part of potential energy, so that one can concentrate the effort to determine a multicanonical weight factor only on the important energy terms. This algorithm has higher sampling efficiency than the multicanonical and canonical algorithms. (review)

  6. Statistical Algorithm for the Adaptation of Detection Thresholds

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2008-01-01

    Many event detection mechanisms in spark ignition automotive engines are based on the comparison of the engine signals to the detection threshold values. Different signal qualities for new and aged engines necessitate the development of an adaptation algorithm for the detection thresholds...... remains constant regardless of engine age and changing detection threshold values. This, in turn, guarantees the same event detection performance for new and aged engines/sensors. Adaptation of the engine knock detection threshold is given as an example. Udgivelsesdato: 2008...

  7. Examination of statistical noise in SPECT image and sampling pitch

    International Nuclear Information System (INIS)

    Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori

    2008-01-01

    Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)

  8. Efficient Parallel Statistical Model Checking of Biochemical Networks

    Directory of Open Access Journals (Sweden)

    Paolo Ballarini

    2009-12-01

    Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.

  9. An Adaptive Filtering Algorithm Based on Genetic Algorithm-Backpropagation Network

    Directory of Open Access Journals (Sweden)

    Kai Hu

    2013-01-01

    Full Text Available A new image filtering algorithm is proposed. GA-BPN algorithm uses genetic algorithm (GA to decide weights in a back propagation neural network (BPN. It has better global optimal characteristics than traditional optimal algorithm. In this paper, we used GA-BPN to do image noise filter researching work. Firstly, this paper uses training samples to train GA-BPN as the noise detector. Then, we utilize the well-trained GA-BPN to recognize noise pixels in target image. And at last, an adaptive weighted average algorithm is used to recover noise pixels recognized by GA-BPN. Experiment data shows that this algorithm has better performance than other filters.

  10. Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

    International Nuclear Information System (INIS)

    Aslan, B.; Zech, G.

    2005-01-01

    We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications

  11. The tradition algorithm approach underestimates the prevalence of serodiagnosis of syphilis in HIV-infected individuals.

    Directory of Open Access Journals (Sweden)

    Bin Chen

    2017-07-01

    Full Text Available Currently, there are three algorithms for screening of syphilis: traditional algorithm, reverse algorithm and European Centre for Disease Prevention and Control (ECDC algorithm. To date, there is not a generally recognized diagnostic algorithm. When syphilis meets HIV, the situation is even more complex. To evaluate their screening performance and impact on the seroprevalence of syphilis in HIV-infected individuals, we conducted a cross-sectional study included 865 serum samples from HIV-infected patients in a tertiary hospital. Every sample (one per patient was tested with toluidine red unheated serum test (TRUST, T. pallidum particle agglutination assay (TPPA, and Treponema pallidum enzyme immunoassay (TP-EIA according to the manufacturer's instructions. The results of syphilis serological testing were interpreted following different algorithms respectively. We directly compared the traditional syphilis screening algorithm with the reverse syphilis screening algorithm in this unique population. The reverse algorithm achieved remarkable higher seroprevalence of syphilis than the traditional algorithm (24.9% vs. 14.2%, p < 0.0001. Compared to the reverse algorithm, the traditional algorithm also had a missed serodiagnosis rate of 42.8%. The total percentages of agreement and corresponding kappa values of tradition and ECDC algorithm compared with those of reverse algorithm were as follows: 89.4%,0.668; 99.8%, 0.994. There was a very good strength of agreement between the reverse and the ECDC algorithm. Our results supported the reverse (or ECDC algorithm in screening of syphilis in HIV-infected populations. In addition, our study demonstrated that screening of HIV-populations using different algorithms may result in a statistically different seroprevalence of syphilis.

  12. A quick method based on SIMPLISMA-KPLS for simultaneously selecting outlier samples and informative samples for model standardization in near infrared spectroscopy

    Science.gov (United States)

    Li, Li-Na; Ma, Chang-Ming; Chang, Ming; Zhang, Ren-Cheng

    2017-12-01

    A novel method based on SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) and Kernel Partial Least Square (KPLS), named as SIMPLISMA-KPLS, is proposed in this paper for selection of outlier samples and informative samples simultaneously. It is a quick algorithm used to model standardization (or named as model transfer) in near infrared (NIR) spectroscopy. The NIR experiment data of the corn for analysis of the protein content is introduced to evaluate the proposed method. Piecewise direct standardization (PDS) is employed in model transfer. And the comparison of SIMPLISMA-PDS-KPLS and KS-PDS-KPLS is given in this research by discussion of the prediction accuracy of protein content and calculation speed of each algorithm. The conclusions include that SIMPLISMA-KPLS can be utilized as an alternative sample selection method for model transfer. Although it has similar accuracy to Kennard-Stone (KS), it is different from KS as it employs concentration information in selection program. This means that it ensures analyte information is involved in analysis, and the spectra (X) of the selected samples is interrelated with concentration (y). And it can be used for outlier sample elimination simultaneously by validation of calibration. According to the statistical data results of running time, it is clear that the sample selection process is more rapid when using KPLS. The quick algorithm of SIMPLISMA-KPLS is beneficial to improve the speed of online measurement using NIR spectroscopy.

  13. Topology for Statistical Modeling of Petascale Data

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)

    2013-10-31

    Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.

  14. A genetic algorithm-based framework for wavelength selection on sample categorization.

    Science.gov (United States)

    Anzanello, Michel J; Yamashita, Gabrielli; Marcelo, Marcelo; Fogliatto, Flávio S; Ortiz, Rafael S; Mariotti, Kristiane; Ferrão, Marco F

    2017-08-01

    In forensic and pharmaceutical scenarios, the application of chemometrics and optimization techniques has unveiled common and peculiar features of seized medicine and drug samples, helping investigative forces to track illegal operations. This paper proposes a novel framework aimed at identifying relevant subsets of attenuated total reflectance Fourier transform infrared (ATR-FTIR) wavelengths for classifying samples into two classes, for example authentic or forged categories in case of medicines, or salt or base form in cocaine analysis. In the first step of the framework, the ATR-FTIR spectra were partitioned into equidistant intervals and the k-nearest neighbour (KNN) classification technique was applied to each interval to insert samples into proper classes. In the next step, selected intervals were refined through the genetic algorithm (GA) by identifying a limited number of wavelengths from the intervals previously selected aimed at maximizing classification accuracy. When applied to Cialis®, Viagra®, and cocaine ATR-FTIR datasets, the proposed method substantially decreased the number of wavelengths needed to categorize, and increased the classification accuracy. From a practical perspective, the proposed method provides investigative forces with valuable information towards monitoring illegal production of drugs and medicines. In addition, focusing on a reduced subset of wavelengths allows the development of portable devices capable of testing the authenticity of samples during police checking events, avoiding the need for later laboratorial analyses and reducing equipment expenses. Theoretically, the proposed GA-based approach yields more refined solutions than the current methods relying on interval approaches, which tend to insert irrelevant wavelengths in the retained intervals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. GTI: a novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets.

    Directory of Open Access Journals (Sweden)

    John Patrick Mpindi

    Full Text Available BACKGROUND: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type ('outlier genes', a hallmark of potential oncogenes. METHODOLOGY: A new statistical method (the gene tissue index, GTI was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29 of these genes, and 17 of these 19 genes (90% showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. CONCLUSIONS/SIGNIFICANCE: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is

  16. A histogram-free multicanonical Monte Carlo algorithm for the construction of analytical density of states

    Energy Technology Data Exchange (ETDEWEB)

    Eisenbach, Markus [ORNL; Li, Ying Wai [ORNL

    2017-06-01

    We report a new multicanonical Monte Carlo (MC) algorithm to obtain the density of states (DOS) for physical systems with continuous state variables in statistical mechanics. Our algorithm is able to obtain an analytical form for the DOS expressed in a chosen basis set, instead of a numerical array of finite resolution as in previous variants of this class of MC methods such as the multicanonical (MUCA) sampling and Wang-Landau (WL) sampling. This is enabled by storing the visited states directly in a data set and avoiding the explicit collection of a histogram. This practice also has the advantage of avoiding undesirable artificial errors caused by the discretization and binning of continuous state variables. Our results show that this scheme is capable of obtaining converged results with a much reduced number of Monte Carlo steps, leading to a significant speedup over existing algorithms.

  17. Sample Reuse in Statistical Remodeling.

    Science.gov (United States)

    1987-08-01

    as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of

  18. Optimal design of sampling and mapping schemes in the radiometric exploration of Chipilapa, El Salvador (Geo-statistics)

    International Nuclear Information System (INIS)

    Balcazar G, M.; Flores R, J.H.

    1992-01-01

    As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)

  19. Quantitative analysis of emphysema and airway measurements according to iterative reconstruction algorithms: comparison of filtered back projection, adaptive statistical iterative reconstruction and model-based iterative reconstruction

    International Nuclear Information System (INIS)

    Choo, Ji Yung; Goo, Jin Mo; Park, Chang Min; Park, Sang Joon; Lee, Chang Hyun; Shim, Mi-Suk

    2014-01-01

    To evaluate filtered back projection (FBP) and two iterative reconstruction (IR) algorithms and their effects on the quantitative analysis of lung parenchyma and airway measurements on computed tomography (CT) images. Low-dose chest CT obtained in 281 adult patients were reconstructed using three algorithms: FBP, adaptive statistical IR (ASIR) and model-based IR (MBIR). Measurements of each dataset were compared: total lung volume, emphysema index (EI), airway measurements of the lumen and wall area as well as average wall thickness. Accuracy of airway measurements of each algorithm was also evaluated using an airway phantom. EI using a threshold of -950 HU was significantly different among the three algorithms in decreasing order of FBP (2.30 %), ASIR (1.49 %) and MBIR (1.20 %) (P < 0.01). Wall thickness was also significantly different among the three algorithms with FBP (2.09 mm) demonstrating thicker walls than ASIR (2.00 mm) and MBIR (1.88 mm) (P < 0.01). Airway phantom analysis revealed that MBIR showed the most accurate value for airway measurements. The three algorithms presented different EIs and wall thicknesses, decreasing in the order of FBP, ASIR and MBIR. Thus, care should be taken in selecting the appropriate IR algorithm on quantitative analysis of the lung. (orig.)

  20. Quantitative analysis of emphysema and airway measurements according to iterative reconstruction algorithms: comparison of filtered back projection, adaptive statistical iterative reconstruction and model-based iterative reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Choo, Ji Yung [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of); Korea University Ansan Hospital, Ansan-si, Department of Radiology, Gyeonggi-do (Korea, Republic of); Goo, Jin Mo; Park, Chang Min; Park, Sang Joon [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of); Seoul National University, Cancer Research Institute, Seoul (Korea, Republic of); Lee, Chang Hyun; Shim, Mi-Suk [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of)

    2014-04-15

    To evaluate filtered back projection (FBP) and two iterative reconstruction (IR) algorithms and their effects on the quantitative analysis of lung parenchyma and airway measurements on computed tomography (CT) images. Low-dose chest CT obtained in 281 adult patients were reconstructed using three algorithms: FBP, adaptive statistical IR (ASIR) and model-based IR (MBIR). Measurements of each dataset were compared: total lung volume, emphysema index (EI), airway measurements of the lumen and wall area as well as average wall thickness. Accuracy of airway measurements of each algorithm was also evaluated using an airway phantom. EI using a threshold of -950 HU was significantly different among the three algorithms in decreasing order of FBP (2.30 %), ASIR (1.49 %) and MBIR (1.20 %) (P < 0.01). Wall thickness was also significantly different among the three algorithms with FBP (2.09 mm) demonstrating thicker walls than ASIR (2.00 mm) and MBIR (1.88 mm) (P < 0.01). Airway phantom analysis revealed that MBIR showed the most accurate value for airway measurements. The three algorithms presented different EIs and wall thicknesses, decreasing in the order of FBP, ASIR and MBIR. Thus, care should be taken in selecting the appropriate IR algorithm on quantitative analysis of the lung. (orig.)

  1. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)

    2010-08-15

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  2. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Science.gov (United States)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  3. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.

    2010-01-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  4. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    Energy Technology Data Exchange (ETDEWEB)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2013-10-15

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.

  5. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    International Nuclear Information System (INIS)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man

    2013-01-01

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis

  6. Trajectory averaging for stochastic approximation MCMC algorithms

    KAUST Repository

    Liang, Faming

    2010-01-01

    to the stochastic approximation Monte Carlo algorithm [Liang, Liu and Carroll J. Amer. Statist. Assoc. 102 (2007) 305-320]. The application of the trajectory averaging estimator to other stochastic approximationMCMC algorithms, for example, a stochastic

  7. Autism Diagnostic Interview-Revised (ADI-R) Algorithms for Toddlers and Young Preschoolers: Application in a Non-US Sample of 1,104 Children

    Science.gov (United States)

    de Bildt, Annelies; Sytema, Sjoerd; Zander, Eric; Bölte, Sven; Sturm, Harald; Yirmiya, Nurit; Yaari, Maya; Charman, Tony; Salomone, Erica; LeCouteur, Ann; Green, Jonathan; Bedia, Ricardo Canal; Primo, Patricia García; van Daalen, Emma; de Jonge, Maretha V.; Guðmundsdóttir, Emilía; Jóhannsdóttir, Sigurrós; Raleva, Marija; Boskovska, Meri; Rogé, Bernadette; Baduel, Sophie; Moilanen, Irma; Yliherva, Anneli; Buitelaar, Jan; Oosterling, Iris J.

    2015-01-01

    The current study aimed to investigate the Autism Diagnostic Interview-Revised (ADI-R) algorithms for toddlers and young preschoolers (Kim and Lord, "J Autism Dev Disord" 42(1):82-93, 2012) in a non-US sample from ten sites in nine countries (n = 1,104). The construct validity indicated a good fit of the algorithms. The diagnostic…

  8. Parameter sampling capabilities of sequential and simultaneous data assimilation: II. Statistical analysis of numerical results

    International Nuclear Information System (INIS)

    Fossum, Kristian; Mannseth, Trond

    2014-01-01

    We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)

  9. Boolean Queries Optimization by Genetic Algorithms

    Czech Academy of Sciences Publication Activity Database

    Húsek, Dušan; Owais, S.S.J.; Krömer, P.; Snášel, Václav

    2005-01-01

    Roč. 15, - (2005), s. 395-409 ISSN 1210-0552 R&D Projects: GA AV ČR 1ET100300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary algorithms * genetic algorithms * genetic programming * information retrieval * Boolean query Subject RIV: BB - Applied Statistics, Operational Research

  10. The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

    International Nuclear Information System (INIS)

    Ghanbari, Y.; Habibnia, A.; Memar, A.

    2009-01-01

    In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample

  11. A system for learning statistical motion patterns.

    Science.gov (United States)

    Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

    2006-09-01

    Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.

  12. Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

    NARCIS (Netherlands)

    Wiel, van de M.A.

    1998-01-01

    We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for

  13. Statistically significant relational data mining :

    Energy Technology Data Exchange (ETDEWEB)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  14. TRAN-STAT: statistics for environmental studies, Number 22. Comparison of soil-sampling techniques for plutonium at Rocky Flats

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.

    1983-01-01

    A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction

  15. A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

    Directory of Open Access Journals (Sweden)

    Daniel Vasiliu

    Full Text Available Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED. Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

  16. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    Science.gov (United States)

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  17. Comparative analysis of decision tree algorithms on quality of water contaminated with soil

    Directory of Open Access Journals (Sweden)

    Mara Andrea Dota

    2015-02-01

    Full Text Available Agriculture, roads, animal farms and other land uses may modify the water quality from rivers, dams and other surface freshwaters. In the control of the ecological process and for environmental management, it is necessary to quickly and accurately identify surface water contamination (in areas such as rivers and dams with contaminated runoff waters coming, for example, from cultivation and urban areas. This paper presents a comparative analysis of different classification algorithms applied to the data collected from a sample of soil-contaminated water aiming to identify if the water quality classification proposed in this research agrees with reality. The sample was part of a laboratory experiment, which began with a sample of treated water added with increasing fractions of soil. The results show that the proposed classification for water quality in this scenario is coherent, because different algorithms indicated a strong statistic relationship between the classes and their instances, that is, in the classes that qualify the water sample and the values which describe each class. The proposed water classification varies from excelling to very awful (12 classes

  18. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

    Science.gov (United States)

    Heskes, Tom; Eisinga, Rob; Breitling, Rainer

    2014-11-21

    The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

  19. Recursive algorithms for phylogenetic tree counting.

    Science.gov (United States)

    Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

    2013-10-28

    In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.

  20. Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

    NARCIS (Netherlands)

    Bekker, P.; Kleibergen, F.R.

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,

  1. Fungi identify the geographic origin of dust samples.

    Directory of Open Access Journals (Sweden)

    Neal S Grantham

    Full Text Available There is a long history of archaeologists and forensic scientists using pollen found in a dust sample to identify its geographic origin or history. Such palynological approaches have important limitations as they require time-consuming identification of pollen grains, a priori knowledge of plant species distributions, and a sufficient diversity of pollen types to permit spatial or temporal identification. We demonstrate an alternative approach based on DNA sequencing analyses of the fungal diversity found in dust samples. Using nearly 1,000 dust samples collected from across the continental U.S., our analyses identify up to 40,000 fungal taxa from these samples, many of which exhibit a high degree of geographic endemism. We develop a statistical learning algorithm via discriminant analysis that exploits this geographic endemicity in the fungal diversity to correctly identify samples to within a few hundred kilometers of their geographic origin with high probability. In addition, our statistical approach provides a measure of certainty for each prediction, in contrast with current palynology methods that are almost always based on expert opinion and devoid of statistical inference. Fungal taxa found in dust samples can therefore be used to identify the origin of that dust and, more importantly, we can quantify our degree of certainty that a sample originated in a particular place. This work opens up a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

  2. Sample size calculation in metabolic phenotyping studies.

    Science.gov (United States)

    Billoir, Elise; Navratil, Vincent; Blaise, Benjamin J

    2015-09-01

    The number of samples needed to identify significant effects is a key question in biomedical studies, with consequences on experimental designs, costs and potential discoveries. In metabolic phenotyping studies, sample size determination remains a complex step. This is due particularly to the multiple hypothesis-testing framework and the top-down hypothesis-free approach, with no a priori known metabolic target. Until now, there was no standard procedure available to address this purpose. In this review, we discuss sample size estimation procedures for metabolic phenotyping studies. We release an automated implementation of the Data-driven Sample size Determination (DSD) algorithm for MATLAB and GNU Octave. Original research concerning DSD was published elsewhere. DSD allows the determination of an optimized sample size in metabolic phenotyping studies. The procedure uses analytical data only from a small pilot cohort to generate an expanded data set. The statistical recoupling of variables procedure is used to identify metabolic variables, and their intensity distributions are estimated by Kernel smoothing or log-normal density fitting. Statistically significant metabolic variations are evaluated using the Benjamini-Yekutieli correction and processed for data sets of various sizes. Optimal sample size determination is achieved in a context of biomarker discovery (at least one statistically significant variation) or metabolic exploration (a maximum of statistically significant variations). DSD toolbox is encoded in MATLAB R2008A (Mathworks, Natick, MA) for Kernel and log-normal estimates, and in GNU Octave for log-normal estimates (Kernel density estimates are not robust enough in GNU octave). It is available at http://www.prabi.fr/redmine/projects/dsd/repository, with a tutorial at http://www.prabi.fr/redmine/projects/dsd/wiki. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  3. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    Science.gov (United States)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  4. Fast optimization of statistical potentials for structurally constrained phylogenetic models

    Directory of Open Access Journals (Sweden)

    Rodrigue Nicolas

    2009-09-01

    Full Text Available Abstract Background Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms. Results Here, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure. Conclusion Due to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.

  5. GPU-Vote: A Framework for Accelerating Voting Algorithms on GPU.

    NARCIS (Netherlands)

    Braak, van den G.J.W.; Nugteren, C.; Mesman, B.; Corporaal, H.; Kaklamanis, C.; Papatheodorou, T.; Spirakis, P.G.

    2012-01-01

    Voting algorithms, such as histogram and Hough transforms, are frequently used algorithms in various domains, such as statistics and image processing. Algorithms in these domains may be accelerated using GPUs. Implementing voting algorithms efficiently on a GPU however is far from trivial due to

  6. Sampling from a polytope and hard-disk Monte Carlo

    International Nuclear Information System (INIS)

    Kapfer, Sebastian C; Krauth, Werner

    2013-01-01

    The hard-disk problem, the statics and the dynamics of equal two-dimensional hard spheres in a periodic box, has had a profound influence on statistical and computational physics. Markov-chain Monte Carlo and molecular dynamics were first discussed for this model. Here we reformulate hard-disk Monte Carlo algorithms in terms of another classic problem, namely the sampling from a polytope. Local Markov-chain Monte Carlo, as proposed by Metropolis et al. in 1953, appears as a sequence of random walks in high-dimensional polytopes, while the moves of the more powerful event-chain algorithm correspond to molecular dynamics evolution. We determine the convergence properties of Monte Carlo methods in a special invariant polytope associated with hard-disk configurations, and the implications for convergence of hard-disk sampling. Finally, we discuss parallelization strategies for event-chain Monte Carlo and present results for a multicore implementation

  7. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, Paul A.; Kleibergen, Frank

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,

  8. An Automated Energy Detection Algorithm Based on Morphological and Statistical Processing Techniques

    Science.gov (United States)

    2018-01-09

    100 kHz, 1 MHz 100 MHz–1 GHz 1 100 kHz 3. Statistical Processing 3.1 Statistical Analysis Statistical analysis is the mathematical science...quantitative terms. In commercial prognostics and diagnostic vibrational monitoring applications , statistical techniques that are mainly used for alarm...Balakrishnan N, editors. Handbook of statistics . Amsterdam (Netherlands): Elsevier Science; 1998. p 555–602; (Order statistics and their applications

  9. Developement of a same-side kaon tagging algorithm of B^0_s decays for measuring delta m_s at CDF II

    Energy Technology Data Exchange (ETDEWEB)

    Menzemer, Stephanie; /Heidelberg U.

    2006-06-01

    The authors developed a Same-Side Kaon Tagging algorithm to determine the production flavor of B{sub s}{sup 0} mesons. Until the B{sub s}{sup 0} mixing frequency is clearly observed the performance of the Same-Side Kaon Tagging algorithm can not be measured on data but has to be determined on Monte Carlo simulation. Data and Monte Carlo agreement has been evaluated for both the B{sub s}{sup 0} and the high statistics B{sup +} and B{sup 0} modes. Extensive systematic studies were performed to quantify potential discrepancies between data and Monte Carlo. The final optimized tagging algorithm exploits the particle identification capability of the CDF II detector. it achieves a tagging performance of {epsilon}D{sup 2} = 4.0{sub -1.2}{sup +0.9} on the B{sub s}{sup 0} {yields} D{sub s}{sup -} {pi}{sup +} sample. The Same-Side Kaon Tagging algorithm presented here has been applied to the ongoing B{sub s}{sup 0} mixing analysis, and has provided a factor of 3-4 increase in the effective statistical size of the sample. This improvement results in the first direct measurement of the B{sub s}{sup 0} mixing frequency.

  10. Shape analysis of corpus callosum in phenylketonuria using a new 3D correspondence algorithm

    Science.gov (United States)

    He, Qing; Christ, Shawn E.; Karsch, Kevin; Peck, Dawn; Duan, Ye

    2010-03-01

    Statistical shape analysis of brain structures has gained increasing interest from neuroimaging community because it can precisely locate shape differences between healthy and pathological structures. The most difficult and crucial problem is establishing shape correspondence among individual 3D shapes. This paper proposes a new algorithm for 3D shape correspondence. A set of landmarks are sampled on a template shape, and initial correspondence is established between the template and the target shape based on the similarity of locations and normal directions. The landmarks on the target are then refined by iterative thin plate spline. The algorithm is simple and fast, and no spherical mapping is needed. We apply our method to the statistical shape analysis of the corpus callosum (CC) in phenylketonuria (PKU), and significant local shape differences between the patients and the controls are found in the most anterior and posterior aspects of the corpus callosum.

  11. CONFIDENCE LEVELS AND/VS. STATISTICAL HYPOTHESIS TESTING IN STATISTICAL ANALYSIS. CASE STUDY

    Directory of Open Access Journals (Sweden)

    ILEANA BRUDIU

    2009-05-01

    Full Text Available Estimated parameters with confidence intervals and testing statistical assumptions used in statistical analysis to obtain conclusions on research from a sample extracted from the population. Paper to the case study presented aims to highlight the importance of volume of sample taken in the study and how this reflects on the results obtained when using confidence intervals and testing for pregnant. If statistical testing hypotheses not only give an answer "yes" or "no" to some questions of statistical estimation using statistical confidence intervals provides more information than a test statistic, show high degree of uncertainty arising from small samples and findings build in the "marginally significant" or "almost significant (p very close to 0.05.

  12. e-DMDAV: A new privacy preserving algorithm for wearable enterprise information systems

    Science.gov (United States)

    Zhang, Zhenjiang; Wang, Xiaoni; Uden, Lorna; Zhang, Peng; Zhao, Yingsi

    2018-04-01

    Wearable devices have been widely used in many fields to improve the quality of people's lives. More and more data on individuals and businesses are collected by statistical organizations though those devices. Almost all of this data holds confidential information. Statistical Disclosure Control (SDC) seeks to protect statistical data in such a way that it can be released without giving away confidential information that can be linked to specific individuals or entities. The MDAV (Maximum Distance to Average Vector) algorithm is an efficient micro-aggregation algorithm belonging to SDC. However, the MDAV algorithm cannot survive homogeneity and background knowledge attacks because it was designed for static numerical data. This paper proposes a systematic dynamic-updating anonymity algorithm based on MDAV called the e-DMDAV algorithm. This algorithm introduces a new parameter and a table to ensure that the k records in one cluster with the range of the distinct values in each cluster is no less than e for numerical and non-numerical datasets. This new algorithm has been evaluated and compared with the MDAV algorithm. The simulation results show that the new algorithm outperforms MDAV in terms of minimizing distortion and disclosure risk with a similar computational cost.

  13. Convergence and Efficiency of Adaptive Importance Sampling Techniques with Partial Biasing

    Science.gov (United States)

    Fort, G.; Jourdain, B.; Lelièvre, T.; Stoltz, G.

    2018-04-01

    We propose a new Monte Carlo method to efficiently sample a multimodal distribution (known up to a normalization constant). We consider a generalization of the discrete-time Self Healing Umbrella Sampling method, which can also be seen as a generalization of well-tempered metadynamics. The dynamics is based on an adaptive importance technique. The importance function relies on the weights (namely the relative probabilities) of disjoint sets which form a partition of the space. These weights are unknown but are learnt on the fly yielding an adaptive algorithm. In the context of computational statistical physics, the logarithm of these weights is, up to an additive constant, the free-energy, and the discrete valued function defining the partition is called the collective variable. The algorithm falls into the general class of Wang-Landau type methods, and is a generalization of the original Self Healing Umbrella Sampling method in two ways: (i) the updating strategy leads to a larger penalization strength of already visited sets in order to escape more quickly from metastable states, and (ii) the target distribution is biased using only a fraction of the free-energy, in order to increase the effective sample size and reduce the variance of importance sampling estimators. We prove the convergence of the algorithm and analyze numerically its efficiency on a toy example.

  14. Evaluation of ultrasonic array imaging algorithms for inspection of a coarse grained material

    Science.gov (United States)

    Van Pamel, A.; Lowe, M. J. S.; Brett, C. R.

    2014-02-01

    Improving the ultrasound inspection capability for coarse grain metals remains of longstanding interest to industry and the NDE research community and is expected to become increasingly important for next generation power plants. A test sample of coarse grained Inconel 625 which is representative of future power plant components has been manufactured to test the detectability of different inspection techniques. Conventional ultrasonic A, B, and C-scans showed the sample to be extraordinarily difficult to inspect due to its scattering behaviour. However, in recent years, array probes and Full Matrix Capture (FMC) imaging algorithms, which extract the maximum amount of information possible, have unlocked exciting possibilities for improvements. This article proposes a robust methodology to evaluate the detection performance of imaging algorithms, applying this to three FMC imaging algorithms; Total Focusing Method (TFM), Phase Coherent Imaging (PCI), and Decomposition of the Time Reversal Operator with Multiple Scattering (DORT MSF). The methodology considers the statistics of detection, presenting the detection performance as Probability of Detection (POD) and probability of False Alarm (PFA). The data is captured in pulse-echo mode using 64 element array probes at centre frequencies of 1MHz and 5MHz. All three algorithms are shown to perform very similarly when comparing their flaw detection capabilities on this particular case.

  15. Modular Regularization Algorithms

    DEFF Research Database (Denmark)

    Jacobsen, Michael

    2004-01-01

    The class of linear ill-posed problems is introduced along with a range of standard numerical tools and basic concepts from linear algebra, statistics and optimization. Known algorithms for solving linear inverse ill-posed problems are analyzed to determine how they can be decomposed into indepen...

  16. Determination of minimum sample size for fault diagnosis of automobile hydraulic brake system using power analysis

    Directory of Open Access Journals (Sweden)

    V. Indira

    2015-03-01

    Full Text Available Hydraulic brake in automobile engineering is considered to be one of the important components. Condition monitoring and fault diagnosis of such a component is very essential for safety of passengers, vehicles and to minimize the unexpected maintenance time. Vibration based machine learning approach for condition monitoring of hydraulic brake system is gaining momentum. Training and testing the classifier are two important activities in the process of feature classification. This study proposes a systematic statistical method called power analysis to find the minimum number of samples required to train the classifier with statistical stability so as to get good classification accuracy. Descriptive statistical features have been used and the more contributing features have been selected by using C4.5 decision tree algorithm. The results of power analysis have also been verified using a decision tree algorithm namely, C4.5.

  17. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing

  18. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell

  19. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian

    2016-02-20

    © 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  20. An efficient forward-reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Vilanova, Pedro

    2016-01-07

    In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  1. A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation.

    Science.gov (United States)

    Layton, D M; Bundschuh, R

    2005-01-01

    Computational RNA secondary structure prediction is rather well established. However, such prediction algorithms always depend on a large number of experimentally measured parameters. Here, we study how sensitive structure prediction algorithms are to changes in these parameters. We found already that for changes corresponding to the actual experimental error to which these parameters have been determined, 30% of the structure are falsely predicted whereas the ground state structure is preserved under parameter perturbation in only 5% of all the cases. We establish that base-pairing probabilities calculated in a thermal ensemble are viable although not a perfect measure for the reliability of the prediction of individual structure elements. Here, a new measure of stability using parameter perturbation is proposed, and its limitations are discussed.

  2. Data Fusion for a Vision-Radiological System: a Statistical Calibration Algorithm

    International Nuclear Information System (INIS)

    Enqvist, Andreas; Koppal, Sanjeev; Riley, Phillip

    2015-01-01

    Presented here is a fusion system based on simple, low-cost computer vision and radiological sensors for tracking of multiple objects and identifying potential radiological materials being transported or shipped. The main focus of this work is the development of calibration algorithms for characterizing the fused sensor system as a single entity. There is an apparent need for correcting for a scene deviation from the basic inverse distance-squared law governing the detection rates even when evaluating system calibration algorithms. In particular, the computer vision system enables a map of distance-dependence of the sources being tracked, to which the time-dependent radiological data can be incorporated by means of data fusion of the two sensors' output data. (authors)

  3. Data Fusion for a Vision-Radiological System: a Statistical Calibration Algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Enqvist, Andreas; Koppal, Sanjeev; Riley, Phillip [University of Florida, Gainesville, FL 32611 (United States)

    2015-07-01

    Presented here is a fusion system based on simple, low-cost computer vision and radiological sensors for tracking of multiple objects and identifying potential radiological materials being transported or shipped. The main focus of this work is the development of calibration algorithms for characterizing the fused sensor system as a single entity. There is an apparent need for correcting for a scene deviation from the basic inverse distance-squared law governing the detection rates even when evaluating system calibration algorithms. In particular, the computer vision system enables a map of distance-dependence of the sources being tracked, to which the time-dependent radiological data can be incorporated by means of data fusion of the two sensors' output data. (authors)

  4. Determination of Sr-90 in milk samples from the study of statistical results

    Directory of Open Access Journals (Sweden)

    Otero-Pazos Alberto

    2017-01-01

    Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.

  5. Development of morphing algorithms for Histfactory using information geometry

    Energy Technology Data Exchange (ETDEWEB)

    Bandyopadhyay, Anjishnu; Brock, Ian [University of Bonn (Germany); Cranmer, Kyle [New York University (United States)

    2016-07-01

    Many statistical analyses are based on likelihood fits. In any likelihood fit we try to incorporate all uncertainties, both systematic and statistical. We generally have distributions for the nominal and ±1 σ variations of a given uncertainty. Using that information, Histfactory morphs the distributions for any arbitrary value of the given uncertainties. In this talk, a new morphing algorithm will be presented, which is based on information geometry. The algorithm uses the information about the difference between various probability distributions. Subsequently, we map this information onto geometrical structures and develop the algorithm on the basis of different geometrical properties. Apart from varying all nuisance parameters together, this algorithm can also probe both small (< 1 σ) and large (> 2 σ) variations. It will also be shown how this algorithm can be used for interpolating other forms of probability distributions.

  6. Geometric approximation algorithms

    CERN Document Server

    Har-Peled, Sariel

    2011-01-01

    Exact algorithms for dealing with geometric objects are complicated, hard to implement in practice, and slow. Over the last 20 years a theory of geometric approximation algorithms has emerged. These algorithms tend to be simple, fast, and more robust than their exact counterparts. This book is the first to cover geometric approximation algorithms in detail. In addition, more traditional computational geometry techniques that are widely used in developing such algorithms, like sampling, linear programming, etc., are also surveyed. Other topics covered include approximate nearest-neighbor search, shape approximation, coresets, dimension reduction, and embeddings. The topics covered are relatively independent and are supplemented by exercises. Close to 200 color figures are included in the text to illustrate proofs and ideas.

  7. Statistical mechanics in a nutshell

    CERN Document Server

    Peliti, Luca

    2011-01-01

    Statistical mechanics is one of the most exciting areas of physics today, and it also has applications to subjects as diverse as economics, social behavior, algorithmic theory, and evolutionary biology. Statistical Mechanics in a Nutshell offers the most concise, self-contained introduction to this rapidly developing field. Requiring only a background in elementary calculus and elementary mechanics, this book starts with the basics, introduces the most important developments in classical statistical mechanics over the last thirty years, and guides readers to the very threshold of today

  8. Effect of the Target Motion Sampling Temperature Treatment Method on the Statistics and Performance

    Science.gov (United States)

    Viitanen, Tuomas; Leppänen, Jaakko

    2014-06-01

    Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very strong resonances. This effect is actually not related to the usage of sampled responses, but is instead an inherent property of the TMS tracking method and concerns both EBT and 0 K calculations.

  9. The Top Ten Algorithms in Data Mining

    CERN Document Server

    Wu, Xindong

    2009-01-01

    From classification and clustering to statistical learning, association analysis, and link mining, this book covers the most important topics in data mining research. It presents the ten most influential algorithms used in the data mining community today. Each chapter provides a detailed description of the algorithm, a discussion of available software implementation, advanced topics, and exercises. With a simple data set, examples illustrate how each algorithm works and highlight the overall performance of each algorithm in a real-world application. Featuring contributions from leading researc

  10. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    International Nuclear Information System (INIS)

    Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2015-01-01

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  11. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)

    2015-02-15

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  12. Level-0 trigger algorithms for the ALICE PHOS detector

    CERN Document Server

    Wang, D; Wang, Y P; Huang, G M; Kral, J; Yin, Z B; Zhou, D C; Zhang, F; Ullaland, K; Muller, H; Liu, L J

    2011-01-01

    The PHOS level-0 trigger provides a minimum bias trigger for p-p collisions and information for a level-1 trigger at both p-p and Pb-Pb collisions. There are two level-0 trigger generating algorithms under consideration: the Direct Comparison algorithm and the Weighted Sum algorithm. In order to study trigger algorithms via simulation, a simplified equivalent model is extracted from the trigger electronics to derive the waveform function of the Analog-or signal as input to the trigger algorithms. Simulations shown that the Weighted Sum algorithm can achieve higher trigger efficiency and provide more precise single channel energy information than the direct compare algorithm. An energy resolution of 9.75 MeV can be achieved with the Weighted Sum algorithm at a sampling rate of 40 Msps (mega samples per second) at 1 GeV. The timing performance at a sampling rate of 40 Msps with the Weighted Sum algorithm is better than that at a sampling rate of 20 Msps with both algorithms. The level-0 trigger can be delivered...

  13. Impact of a New Adaptive Statistical Iterative Reconstruction (ASIR)-V Algorithm on Image Quality in Coronary Computed Tomography Angiography.

    Science.gov (United States)

    Pontone, Gianluca; Muscogiuri, Giuseppe; Andreini, Daniele; Guaricci, Andrea I; Guglielmo, Marco; Baggiano, Andrea; Fazzari, Fabio; Mushtaq, Saima; Conte, Edoardo; Annoni, Andrea; Formenti, Alberto; Mancini, Elisabetta; Verdecchia, Massimo; Campari, Alessandro; Martini, Chiara; Gatti, Marco; Fusini, Laura; Bonfanti, Lorenzo; Consiglio, Elisa; Rabbat, Mark G; Bartorelli, Antonio L; Pepi, Mauro

    2018-03-27

    A new postprocessing algorithm named adaptive statistical iterative reconstruction (ASIR)-V has been recently introduced. The aim of this article was to analyze the impact of ASIR-V algorithm on signal, noise, and image quality of coronary computed tomography angiography. Fifty consecutive patients underwent clinically indicated coronary computed tomography angiography (Revolution CT; GE Healthcare, Milwaukee, WI). Images were reconstructed using filtered back projection and ASIR-V 0%, and a combination of filtered back projection and ASIR-V 20%-80% and ASIR-V 100%. Image noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR) were calculated for left main coronary artery (LM), left anterior descending artery (LAD), left circumflex artery (LCX), and right coronary artery (RCA) and were compared between the different postprocessing algorithms used. Similarly a four-point Likert image quality score of coronary segments was graded for each dataset and compared. A cutoff value of P ASIR-V 0%, ASIR-V 100% demonstrated a significant reduction of image noise in all coronaries (P ASIR-V 0%, SNR was significantly higher with ASIR-V 60% in LM (P ASIR-V 0%, CNR for ASIR-V ≥60% was significantly improved in LM (P ASIR-V ≥80%. ASIR-V 60% had significantly better Likert image quality scores compared to ASIR-V 0% in segment-, vessel-, and patient-based analyses (P ASIR-V 60% provides the optimal balance between image noise, SNR, CNR, and image quality. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  14. Parallel auto-correlative statistics with VTK.

    Energy Technology Data Exchange (ETDEWEB)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  15. Statistical inference using weak chaos and infinite memory

    International Nuclear Information System (INIS)

    Welling, Max; Chen Yutian

    2010-01-01

    We describe a class of deterministic weakly chaotic dynamical systems with infinite memory. These 'herding systems' combine learning and inference into one algorithm, where moments or data-items are converted directly into an arbitrarily long sequence of pseudo-samples. This sequence has infinite range correlations and as such is highly structured. We show that its information content, as measured by sub-extensive entropy, can grow as fast as K log T, which is faster than the usual 1/2 K log T for exchangeable sequences generated by random posterior sampling from a Bayesian model. In one dimension we prove that herding sequences are equivalent to Sturmian sequences which have complexity exactly log(T + 1). More generally, we advocate the application of the rich theoretical framework around nonlinear dynamical systems, chaos theory and fractal geometry to statistical learning.

  16. Statistical inference using weak chaos and infinite memory

    Energy Technology Data Exchange (ETDEWEB)

    Welling, Max; Chen Yutian, E-mail: welling@ics.uci.ed, E-mail: yutian.chen@uci.ed [Donald Bren School of Information and Computer Science, University of California Irvine CA 92697-3425 (United States)

    2010-06-01

    We describe a class of deterministic weakly chaotic dynamical systems with infinite memory. These 'herding systems' combine learning and inference into one algorithm, where moments or data-items are converted directly into an arbitrarily long sequence of pseudo-samples. This sequence has infinite range correlations and as such is highly structured. We show that its information content, as measured by sub-extensive entropy, can grow as fast as K log T, which is faster than the usual 1/2 K log T for exchangeable sequences generated by random posterior sampling from a Bayesian model. In one dimension we prove that herding sequences are equivalent to Sturmian sequences which have complexity exactly log(T + 1). More generally, we advocate the application of the rich theoretical framework around nonlinear dynamical systems, chaos theory and fractal geometry to statistical learning.

  17. On the efficiency of chaos optimization algorithms for global optimization

    International Nuclear Information System (INIS)

    Yang Dixiong; Li Gang; Cheng Gengdong

    2007-01-01

    Chaos optimization algorithms as a novel method of global optimization have attracted much attention, which were all based on Logistic map. However, we have noticed that the probability density function of the chaotic sequences derived from Logistic map is a Chebyshev-type one, which may affect the global searching capacity and computational efficiency of chaos optimization algorithms considerably. Considering the statistical property of the chaotic sequences of Logistic map and Kent map, the improved hybrid chaos-BFGS optimization algorithm and the Kent map based hybrid chaos-BFGS algorithm are proposed. Five typical nonlinear functions with multimodal characteristic are tested to compare the performance of five hybrid optimization algorithms, which are the conventional Logistic map based chaos-BFGS algorithm, improved Logistic map based chaos-BFGS algorithm, Kent map based chaos-BFGS algorithm, Monte Carlo-BFGS algorithm, mesh-BFGS algorithm. The computational performance of the five algorithms is compared, and the numerical results make us question the high efficiency of the chaos optimization algorithms claimed in some references. It is concluded that the efficiency of the hybrid optimization algorithms is influenced by the statistical property of chaotic/stochastic sequences generated from chaotic/stochastic algorithms, and the location of the global optimum of nonlinear functions. In addition, it is inappropriate to advocate the high efficiency of the global optimization algorithms only depending on several numerical examples of low-dimensional functions

  18. Recent advances in importance sampling for statistical model checking

    NARCIS (Netherlands)

    Reijsbergen, D.P.; de Boer, Pieter-Tjerk; Scheinhardt, Willem R.W.; Haverkort, Boudewijn R.H.M.

    2013-01-01

    In the following work we present an overview of recent advances in rare event simulation for model checking made at the University of Twente. The overview is divided into the several model classes for which we propose algorithms, namely multicomponent systems, Markov chains and stochastic Petri

  19. Computing Statistics under Interval and Fuzzy Uncertainty Applications to Computer Science and Engineering

    CERN Document Server

    Nguyen, Hung T; Wu, Berlin; Xiang, Gang

    2012-01-01

    In many practical situations, we are interested in statistics characterizing a population of objects: e.g. in the mean height of people from a certain area.   Most algorithms for estimating such statistics assume that the sample values are exact. In practice, sample values come from measurements, and measurements are never absolutely accurate. Sometimes, we know the exact probability distribution of the measurement inaccuracy, but often, we only know the upper bound on this inaccuracy. In this case, we have interval uncertainty: e.g. if the measured value is 1.0, and inaccuracy is bounded by 0.1, then the actual (unknown) value of the quantity can be anywhere between 1.0 - 0.1 = 0.9 and 1.0 + 0.1 = 1.1. In other cases, the values are expert estimates, and we only have fuzzy information about the estimation inaccuracy.   This book shows how to compute statistics under such interval and fuzzy uncertainty. The resulting methods are applied to computer science (optimal scheduling of different processors), to in...

  20. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    Science.gov (United States)

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  1. Artifact removal algorithms for stroke detection using a multistatic MIST beamforming algorithm.

    Science.gov (United States)

    Ricci, E; Di Domenico, S; Cianca, E; Rossi, T

    2015-01-01

    Microwave imaging (MWI) has been recently proved as a promising imaging modality for low-complexity, low-cost and fast brain imaging tools, which could play a fundamental role to efficiently manage emergencies related to stroke and hemorrhages. This paper focuses on the UWB radar imaging approach and in particular on the processing algorithms of the backscattered signals. Assuming the use of the multistatic version of the MIST (Microwave Imaging Space-Time) beamforming algorithm, developed by Hagness et al. for the early detection of breast cancer, the paper proposes and compares two artifact removal algorithms. Artifacts removal is an essential step of any UWB radar imaging system and currently considered artifact removal algorithms have been shown not to be effective in the specific scenario of brain imaging. First of all, the paper proposes modifications of a known artifact removal algorithm. These modifications are shown to be effective to achieve good localization accuracy and lower false positives. However, the main contribution is the proposal of an artifact removal algorithm based on statistical methods, which allows to achieve even better performance but with much lower computational complexity.

  2. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  3. Performance Analysis of the Decentralized Eigendecomposition and ESPRIT Algorithm

    Science.gov (United States)

    Suleiman, Wassim; Pesavento, Marius; Zoubir, Abdelhak M.

    2016-05-01

    In this paper, we consider performance analysis of the decentralized power method for the eigendecomposition of the sample covariance matrix based on the averaging consensus protocol. An analytical expression of the second order statistics of the eigenvectors obtained from the decentralized power method which is required for computing the mean square error (MSE) of subspace-based estimators is presented. We show that the decentralized power method is not an asymptotically consistent estimator of the eigenvectors of the true measurement covariance matrix unless the averaging consensus protocol is carried out over an infinitely large number of iterations. Moreover, we introduce the decentralized ESPRIT algorithm which yields fully decentralized direction-of-arrival (DOA) estimates. Based on the performance analysis of the decentralized power method, we derive an analytical expression of the MSE of DOA estimators using the decentralized ESPRIT algorithm. The validity of our asymptotic results is demonstrated by simulations.

  4. Testing Homogeneity in a Semiparametric Two-Sample Problem

    Directory of Open Access Journals (Sweden)

    Yukun Liu

    2012-01-01

    Full Text Available We study a two-sample homogeneity testing problem, in which one sample comes from a population with density f(x and the other is from a mixture population with mixture density (1−λf(x+λg(x. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semiparametric assumption g(x=f(xeα+βx, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM test based on the penalized empirical likelihood. We prove that the EM test has a simple chi-square limiting distribution, and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.

  5. Using the Perceptron Algorithm to Find Consistent Hypotheses

    OpenAIRE

    Anthony, M.; Shawe-Taylor, J.

    1993-01-01

    The perceptron learning algorithm yields quite naturally an algorithm for finding a linearly separable boolean function consistent with a sample of such a function. Using the idea of a specifying sample, we give a simple proof that this algorithm is not efficient, in general.

  6. Multiparametric statistics

    CERN Document Server

    Serdobolskii, Vadim Ivanovich

    2007-01-01

    This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...

  7. Topology for statistical modeling of petascale data.

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)

    2011-07-01

    This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.

  8. Distribution Agnostic Structured Sparsity Recovery: Algorithms and Applications

    KAUST Repository

    Masood, Mudassir

    2015-05-01

    Compressed sensing has been a very active area of research and several elegant algorithms have been developed for the recovery of sparse signals in the past few years. However, most of these algorithms are either computationally expensive or make some assumptions that are not suitable for all real world problems. Recently, focus has shifted to Bayesian-based approaches that are able to perform sparse signal recovery at much lower complexity while invoking constraint and/or a priori information about the data. While Bayesian approaches have their advantages, these methods must have access to a priori statistics. Usually, these statistics are unknown and are often difficult or even impossible to predict. An effective workaround is to assume a distribution which is typically considered to be Gaussian, as it makes many signal processing problems mathematically tractable. Seemingly attractive, this assumption necessitates the estimation of the associated parameters; which could be hard if not impossible. In the thesis, we focus on this aspect of Bayesian recovery and present a framework to address the challenges mentioned above. The proposed framework allows Bayesian recovery of sparse signals but at the same time is agnostic to the distribution of the unknown sparse signal components. The algorithms based on this framework are agnostic to signal statistics and utilize a priori statistics of additive noise and the sparsity rate of the signal, which are shown to be easily estimated from data if not available. In the thesis, we propose several algorithms based on this framework which utilize the structure present in signals for improved recovery. In addition to the algorithm that considers just the sparsity structure of sparse signals, tools that target additional structure of the sparsity recovery problem are proposed. These include several algorithms for a) block-sparse signal estimation, b) joint reconstruction of several common support sparse signals, and c

  9. Statistical sampling methods for soils monitoring

    Science.gov (United States)

    Ann M. Abbott

    2010-01-01

    Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...

  10. Statistical physics of fracture: scientific discovery through high-performance computing

    International Nuclear Information System (INIS)

    Kumar, Phani; Nukala, V V; Simunovic, Srdan; Mills, Richard T

    2006-01-01

    The paper presents the state-of-the-art algorithmic developments for simulating the fracture of disordered quasi-brittle materials using discrete lattice systems. Large scale simulations are often required to obtain accurate scaling laws; however, due to computational complexity, the simulations using the traditional algorithms were limited to small system sizes. We have developed two algorithms: a multiple sparse Cholesky downdating scheme for simulating 2D random fuse model systems, and a block-circulant preconditioner for simulating 2D random fuse model systems. Using these algorithms, we were able to simulate fracture of largest ever lattice system sizes (L = 1024 in 2D, and L = 64 in 3D) with extensive statistical sampling. Our recent simulations on 1024 processors of Cray-XT3 and IBM Blue-Gene/L have further enabled us to explore fracture of 3D lattice systems of size L = 200, which is a significant computational achievement. These largest ever numerical simulations have enhanced our understanding of physics of fracture; in particular, we analyze damage localization and its deviation from percolation behavior, scaling laws for damage density, universality of fracture strength distribution, size effect on the mean fracture strength, and finally the scaling of crack surface roughness

  11. An Unsupervised Method of Change Detection in Multi-Temporal PolSAR Data Using a Test Statistic and an Improved K&I Algorithm

    Directory of Open Access Journals (Sweden)

    Jinqi Zhao

    2017-12-01

    Full Text Available In recent years, multi-temporal imagery from spaceborne sensors has provided a fast and practical means for surveying and assessing changes in terrain surfaces. Owing to the all-weather imaging capability, polarimetric synthetic aperture radar (PolSAR has become a key tool for change detection. Change detection methods include both unsupervised and supervised methods. Supervised change detection, which needs some human intervention, is generally ineffective and impractical. Due to this limitation, unsupervised methods are widely used in change detection. The traditional unsupervised methods only use a part of the polarization information, and the required thresholding algorithms are independent of the multi-temporal data, which results in the change detection map being ineffective and inaccurate. To solve these problems, a novel method of change detection using a test statistic based on the likelihood ratio test and the improved Kittler and Illingworth (K&I minimum-error thresholding algorithm is introduced in this paper. The test statistic is used to generate the comparison image (CI of the multi-temporal PolSAR images, and improved K&I using a generalized Gaussian model simulates the distribution of the CI. As a result of these advantages, we can obtain the change detection map using an optimum threshold. The efficiency of the proposed method is demonstrated by the use of multi-temporal PolSAR images acquired by RADARSAT-2 over Wuhan, China. The experimental results show that the proposed method is effective and highly accurate.

  12. ExSample. A library for sampling Sudakov-type distributions

    Energy Technology Data Exchange (ETDEWEB)

    Plaetzer, Simon

    2011-08-15

    Sudakov-type distributions are at the heart of generating radiation in parton showers as well as contemporary NLO matching algorithms along the lines of the POWHEG algorithm. In this paper, the C++ library ExSample is introduced, which implements adaptive sampling of Sudakov-type distributions for splitting kernels which are in general only known numerically. Besides the evolution variable, the splitting kernels can depend on an arbitrary number of other degrees of freedom to be sampled, and any number of further parameters which are fixed on an event-by-event basis. (orig.)

  13. ExSample. A library for sampling Sudakov-type distributions

    International Nuclear Information System (INIS)

    Plaetzer, Simon

    2011-08-01

    Sudakov-type distributions are at the heart of generating radiation in parton showers as well as contemporary NLO matching algorithms along the lines of the POWHEG algorithm. In this paper, the C++ library ExSample is introduced, which implements adaptive sampling of Sudakov-type distributions for splitting kernels which are in general only known numerically. Besides the evolution variable, the splitting kernels can depend on an arbitrary number of other degrees of freedom to be sampled, and any number of further parameters which are fixed on an event-by-event basis. (orig.)

  14. Optimal Design and Related Areas in Optimization and Statistics

    CERN Document Server

    Pronzato, Luc

    2009-01-01

    This edited volume, dedicated to Henry P. Wynn, reflects his broad range of research interests, focusing in particular on the applications of optimal design theory in optimization and statistics. It covers algorithms for constructing optimal experimental designs, general gradient-type algorithms for convex optimization, majorization and stochastic ordering, algebraic statistics, Bayesian networks and nonlinear regression. Written by leading specialists in the field, each chapter contains a survey of the existing literature along with substantial new material. This work will appeal to both the

  15. Volume reconstruction optimization for tomo-PIV algorithms applied to experimental data

    Science.gov (United States)

    Martins, Fabio J. W. A.; Foucaut, Jean-Marc; Thomas, Lionel; Azevedo, Luis F. A.; Stanislas, Michel

    2015-08-01

    Tomographic PIV is a three-component volumetric velocity measurement technique based on the tomographic reconstruction of a particle distribution imaged by multiple camera views. In essence, the performance and accuracy of this technique is highly dependent on the parametric adjustment and the reconstruction algorithm used. Although synthetic data have been widely employed to optimize experiments, the resulting reconstructed volumes might not have optimal quality. The purpose of the present study is to offer quality indicators that can be applied to data samples in order to improve the quality of velocity results obtained by the tomo-PIV technique. The methodology proposed can potentially lead to significantly reduction in the time required to optimize a tomo-PIV reconstruction, also leading to better quality velocity results. Tomo-PIV data provided by a six-camera turbulent boundary-layer experiment were used to optimize the reconstruction algorithms according to this methodology. Velocity statistics measurements obtained by optimized BIMART, SMART and MART algorithms were compared with hot-wire anemometer data and velocity measurement uncertainties were computed. Results indicated that BIMART and SMART algorithms produced reconstructed volumes with equivalent quality as the standard MART with the benefit of reduced computational time.

  16. Volume reconstruction optimization for tomo-PIV algorithms applied to experimental data

    International Nuclear Information System (INIS)

    Martins, Fabio J W A; Foucaut, Jean-Marc; Stanislas, Michel; Thomas, Lionel; Azevedo, Luis F A

    2015-01-01

    Tomographic PIV is a three-component volumetric velocity measurement technique based on the tomographic reconstruction of a particle distribution imaged by multiple camera views. In essence, the performance and accuracy of this technique is highly dependent on the parametric adjustment and the reconstruction algorithm used. Although synthetic data have been widely employed to optimize experiments, the resulting reconstructed volumes might not have optimal quality. The purpose of the present study is to offer quality indicators that can be applied to data samples in order to improve the quality of velocity results obtained by the tomo-PIV technique. The methodology proposed can potentially lead to significantly reduction in the time required to optimize a tomo-PIV reconstruction, also leading to better quality velocity results. Tomo-PIV data provided by a six-camera turbulent boundary-layer experiment were used to optimize the reconstruction algorithms according to this methodology. Velocity statistics measurements obtained by optimized BIMART, SMART and MART algorithms were compared with hot-wire anemometer data and velocity measurement uncertainties were computed. Results indicated that BIMART and SMART algorithms produced reconstructed volumes with equivalent quality as the standard MART with the benefit of reduced computational time. (paper)

  17. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  18. A hybrid algorithm for reliability analysis combining Kriging and subset simulation importance sampling

    International Nuclear Information System (INIS)

    Tong, Cao; Sun, Zhili; Zhao, Qianli; Wang, Qibin; Wang, Shuang

    2015-01-01

    To solve the problem of large computation when failure probability with time-consuming numerical model is calculated, we propose an improved active learning reliability method called AK-SSIS based on AK-IS algorithm. First, an improved iterative stopping criterion in active learning is presented so that iterations decrease dramatically. Second, the proposed method introduces Subset simulation importance sampling (SSIS) into the active learning reliability calculation, and then a learning function suitable for SSIS is proposed. Finally, the efficiency of AK-SSIS is proved by two academic examples from the literature. The results show that AK-SSIS requires fewer calls to the performance function than AK-IS, and the failure probability obtained from AK-SSIS is very robust and accurate. Then this method is applied on a spur gear pair for tooth contact fatigue reliability analysis.

  19. A hybrid algorithm for reliability analysis combining Kriging and subset simulation importance sampling

    Energy Technology Data Exchange (ETDEWEB)

    Tong, Cao; Sun, Zhili; Zhao, Qianli; Wang, Qibin [Northeastern University, Shenyang (China); Wang, Shuang [Jiangxi University of Science and Technology, Ganzhou (China)

    2015-08-15

    To solve the problem of large computation when failure probability with time-consuming numerical model is calculated, we propose an improved active learning reliability method called AK-SSIS based on AK-IS algorithm. First, an improved iterative stopping criterion in active learning is presented so that iterations decrease dramatically. Second, the proposed method introduces Subset simulation importance sampling (SSIS) into the active learning reliability calculation, and then a learning function suitable for SSIS is proposed. Finally, the efficiency of AK-SSIS is proved by two academic examples from the literature. The results show that AK-SSIS requires fewer calls to the performance function than AK-IS, and the failure probability obtained from AK-SSIS is very robust and accurate. Then this method is applied on a spur gear pair for tooth contact fatigue reliability analysis.

  20. MULTI-LEVEL SAMPLING APPROACH FOR CONTINOUS LOSS DETECTION USING ITERATIVE WINDOW AND STATISTICAL MODEL

    OpenAIRE

    Mohd Fo'ad Rohani; Mohd Aizaini Maarof; Ali Selamat; Houssain Kettani

    2010-01-01

    This paper proposes a Multi-Level Sampling (MLS) approach for continuous Loss of Self-Similarity (LoSS) detection using iterative window. The method defines LoSS based on Second Order Self-Similarity (SOSS) statistical model. The Optimization Method (OM) is used to estimate self-similarity parameter since it is fast and more accurate in comparison with other estimation methods known in the literature. Probability of LoSS detection is introduced to measure continuous LoSS detection performance...

  1. Performance of Jet Algorithms in CMS

    CERN Document Server

    CMS Collaboration

    The CMS Combined Software and Analysis Challenge 2007 (CSA07) is well underway and expected to produce a wealth of physics analyses to be applied to the first incoming detector data in 2008. The JetMET group of CMS supports four different jet clustering algorithms for the CSA07 Monte Carlo samples, with two different parameterizations each: \\fastkt, \\siscone, \\midpoint, and \\itcone. We present several studies comparing the performance of these algorithms using QCD dijet and \\ttbar Monte Carlo samples. We specifically observe that the \\siscone algorithm performs equal to or better than the \\midpoint algorithm in all presented studies and propose that \\siscone be adopted as the preferred cone-based jet clustering algorithm in future CMS physics analyses, as it is preferred by theorists for its infrared- and collinear-safety to all orders of perturbative QCD. We furthermore encourage the use of the \\fastkt algorithm which is found to perform as good as any other algorithm under study, features dramatically reduc...

  2. Nonequilibrium molecular dynamics theory, algorithms and applications

    CERN Document Server

    Todd, Billy D

    2017-01-01

    Written by two specialists with over twenty-five years of experience in the field, this valuable text presents a wide range of topics within the growing field of nonequilibrium molecular dynamics (NEMD). It introduces theories which are fundamental to the field - namely, nonequilibrium statistical mechanics and nonequilibrium thermodynamics - and provides state-of-the-art algorithms and advice for designing reliable NEMD code, as well as examining applications for both atomic and molecular fluids. It discusses homogenous and inhomogenous flows and pays considerable attention to highly confined fluids, such as nanofluidics. In addition to statistical mechanics and thermodynamics, the book covers the themes of temperature and thermodynamic fluxes and their computation, the theory and algorithms for homogenous shear and elongational flows, response theory and its applications, heat and mass transport algorithms, applications in molecular rheology, highly confined fluids (nanofluidics), the phenomenon of slip and...

  3. Absolute magnitudes by statistical parallaxes

    International Nuclear Information System (INIS)

    Heck, A.

    1978-01-01

    The author describes an algorithm for stellar luminosity calibrations (based on the principle of maximum likelihood) which allows the calibration of relations of the type: Msub(i)=sup(N)sub(j=1)Σqsub(j)Csub(ij), i=1,...,n, where n is the size of the sample at hand, Msub(i) are the individual absolute magnitudes, Csub(ij) are observational quantities (j=1,...,N), and qsub(j) are the coefficients to be determined. If one puts N=1 and Csub(iN)=1, one has q 1 =M(mean), the mean absolute magnitude of the sample. As additional output, the algorithm provides one also with the dispersion in magnitude of the sample sigmasub(M), the mean solar motion (U,V,W) and the corresponding velocity ellipsoid (sigmasub(u), sigmasub(v), sigmasub(w). The use of this algorithm is illustrated. (Auth.)

  4. Software Alchemy: Turning Complex Statistical Computations into Embarrassingly-Parallel Ones

    Directory of Open Access Journals (Sweden)

    Norman Matloff

    2016-07-01

    Full Text Available The growth in the use of computationally intensive statistical procedures, especially with big data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPUs, clusters and clouds. However, slowdown due to interprocess communication costs typically limits such methods to "embarrassingly parallel" (EP algorithms, especially on non-shared memory platforms. This paper develops a broadlyapplicable method for converting many non-EP algorithms into statistically equivalent EP ones. The method is shown to yield excellent levels of speedup for a variety of statistical computations. It also overcomes certain problems of memory limitations.

  5. STATISTICAL EVALUATION OF SMALL SCALE MIXING DEMONSTRATION SAMPLING AND BATCH TRANSFER PERFORMANCE - 12093

    Energy Technology Data Exchange (ETDEWEB)

    GREER DA; THIEN MG

    2012-01-12

    The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The

  6. Supplementary Material for: Compressing an Ensemble With Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano

    2016-01-01

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific datasets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a nontrivial model to a dataset of 1 billion data points with a covariance matrix comprising of 1018 entries. Supplementary materials for this article are available online.

  7. Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) Algorithm for Classification of Micro Array Data.

    Science.gov (United States)

    Vijay, S Arul Antran; GaneshKumar, P

    2018-02-21

    In the growing scenario, microarray data is extensively used since it provides a more comprehensive understanding of genetic variants among diseases. As the gene expression samples have high dimensionality it becomes tedious to analyze the samples manually. Hence an automated system is needed to analyze these samples. The fuzzy expert system offers a clear classification when compared to the machine learning and statistical methodologies. In fuzzy classification, knowledge acquisition would be a major concern. Despite several existing approaches for knowledge acquisition much effort is necessary to enhance the learning process. This paper proposes an innovative Hybrid Stem Cell (HSC) algorithm that utilizes Ant Colony optimization and Stem Cell algorithm for designing fuzzy classification system to extract the informative rules to form the membership functions from the microarray dataset. The HSC algorithm uses a novel Adaptive Stem Cell Optimization (ASCO) to improve the points of membership function and Ant Colony Optimization to produce the near optimum rule set. In order to extract the most informative genes from the large microarray dataset a method called Mutual Information is used. The performance results of the proposed technique evaluated using the five microarray datasets are simulated. These results prove that the proposed Hybrid Stem Cell (HSC) algorithm produces a precise fuzzy system than the existing methodologies.

  8. Multiagency Urban Search Experiment Detector and Algorithm Test Bed

    Science.gov (United States)

    Nicholson, Andrew D.; Garishvili, Irakli; Peplow, Douglas E.; Archer, Daniel E.; Ray, William R.; Swinney, Mathew W.; Willis, Michael J.; Davidson, Gregory G.; Cleveland, Steven L.; Patton, Bruce W.; Hornback, Donald E.; Peltz, James J.; McLean, M. S. Lance; Plionis, Alexander A.; Quiter, Brian J.; Bandstra, Mark S.

    2017-07-01

    In order to provide benchmark data sets for radiation detector and algorithm development, a particle transport test bed has been created using experimental data as model input and validation. A detailed radiation measurement campaign at the Combined Arms Collective Training Facility in Fort Indiantown Gap, PA (FTIG), USA, provides sample background radiation levels for a variety of materials present at the site (including cinder block, gravel, asphalt, and soil) using long dwell high-purity germanium (HPGe) measurements. In addition, detailed light detection and ranging data and ground-truth measurements inform model geometry. This paper describes the collected data and the application of these data to create background and injected source synthetic data for an arbitrary gamma-ray detection system using particle transport model detector response calculations and statistical sampling. In the methodology presented here, HPGe measurements inform model source terms while detector response calculations are validated via long dwell measurements using 2"×4"×16" NaI(Tl) detectors at a variety of measurement points. A collection of responses, along with sampling methods and interpolation, can be used to create data sets to gauge radiation detector and algorithm (including detection, identification, and localization) performance under a variety of scenarios. Data collected at the FTIG site are available for query, filtering, visualization, and download at muse.lbl.gov.

  9. A sum-over-paths algorithm for third-order impulse-response moment extraction within RC IC-interconnect networks

    Science.gov (United States)

    Wojcik, E. A.; Ni, D.; Lam, T. M.; Le Coz, Y. L.

    2015-07-01

    We have created the first stochastic SoP (Sum-over-Paths) algorithm to extract third-order impulse-response (IR) moment within RC IC interconnects. It employs a newly discovered Feynman SoP Postulate. Importantly, our algorithm maintains computational efficiency and full parallelism. Our approach begins with generation of s-domain nodal-voltage equations. We then perform a Taylor-series expansion of the circuit transfer function. These expansions yield transition diagrams involving mathematical coupling constants, or weight factors, in integral powers of complex frequency s. Our SoP Postulate enables stochastic evaluation of path sums within the circuit transition diagram to order s3-corresponding to the order of IR moment (m3) we seek here. We furnish, for the first time, an informal algebraic proof independently validating our SoP Postulate and algorithm. We list, as well, detailed procedural steps, suitable for coding, that define an efficient stochastic algorithm for m3 IR extraction. Origins of the algorithm's statistical "capacitor-number cubed" correction and "double-counting" weight factors are explained, for completeness. Our algorithm was coded and successfully tested against exact analytical solutions for 3-, 5-, and 10-stage RC lines. We achieved better than 0.65% 1-σ error convergence, after only 10K statistical samples, in less than 1 s of 2-GHz Pentium® execution time. These results continue to suggest that stochastic SoP algorithms may find useful application in circuit analysis of massively coupled networks, such as those encountered in high-end digital IC-interconnect CAD.

  10. Dataset exploited for the development and validation of automated cyanobacteria quantification algorithm, ACQUA

    Directory of Open Access Journals (Sweden)

    Emanuele Gandola

    2016-09-01

    Full Text Available The estimation and quantification of potentially toxic cyanobacteria in lakes and reservoirs are often used as a proxy of risk for water intended for human consumption and recreational activities. Here, we present data sets collected from three volcanic Italian lakes (Albano, Vico, Nemi that present filamentous cyanobacteria strains at different environments. Presented data sets were used to estimate abundance and morphometric characteristics of potentially toxic cyanobacteria comparing manual Vs. automated estimation performed by ACQUA (“ACQUA: Automated Cyanobacterial Quantification Algorithm for toxic filamentous genera using spline curves, pattern recognition and machine learning” (Gandola et al., 2016 [1]. This strategy was used to assess the algorithm performance and to set up the denoising algorithm. Abundance and total length estimations were used for software development, to this aim we evaluated the efficiency of statistical tools and mathematical algorithms, here described. The image convolution with the Sobel filter has been chosen to denoise input images from background signals, then spline curves and least square method were used to parameterize detected filaments and to recombine crossing and interrupted sections aimed at performing precise abundances estimations and morphometric measurements. Keywords: Comparing data, Filamentous cyanobacteria, Algorithm, Deoising, Natural sample

  11. An improved Bayesian tensor regularization and sampling algorithm to track neuronal fiber pathways in the language circuit.

    Science.gov (United States)

    Mishra, Arabinda; Anderson, Adam W; Wu, Xi; Gore, John C; Ding, Zhaohua

    2010-08-01

    The purpose of this work is to design a neuronal fiber tracking algorithm, which will be more suitable for reconstruction of fibers associated with functionally important regions in the human brain. The functional activations in the brain normally occur in the gray matter regions. Hence the fibers bordering these regions are weakly myelinated, resulting in poor performance of conventional tractography methods to trace the fiber links between them. A lower fractional anisotropy in this region makes it even difficult to track the fibers in the presence of noise. In this work, the authors focused on a stochastic approach to reconstruct these fiber pathways based on a Bayesian regularization framework. To estimate the true fiber direction (propagation vector), the a priori and conditional probability density functions are calculated in advance and are modeled as multivariate normal. The variance of the estimated tensor element vector is associated with the uncertainty due to noise and partial volume averaging (PVA). An adaptive and multiple sampling of the estimated tensor element vector, which is a function of the pre-estimated variance, overcomes the effect of noise and PVA in this work. The algorithm has been rigorously tested using a variety of synthetic data sets. The quantitative comparison of the results to standard algorithms motivated the authors to implement it for in vivo DTI data analysis. The algorithm has been implemented to delineate fibers in two major language pathways (Broca's to SMA and Broca's to Wernicke's) across 12 healthy subjects. Though the mean of standard deviation was marginally bigger than conventional (Euler's) approach [P. J. Basser et al., "In vivo fiber tractography using DT-MRI data," Magn. Reson. Med. 44(4), 625-632 (2000)], the number of extracted fibers in this approach was significantly higher. The authors also compared the performance of the proposed method to Lu's method [Y. Lu et al., "Improved fiber tractography with Bayesian

  12. Regularized Statistical Analysis of Anatomy

    DEFF Research Database (Denmark)

    Sjöstrand, Karl

    2007-01-01

    This thesis presents the application and development of regularized methods for the statistical analysis of anatomical structures. Focus is on structure-function relationships in the human brain, such as the connection between early onset of Alzheimer’s disease and shape changes of the corpus...... and mind. Statistics represents a quintessential part of such investigations as they are preluded by a clinical hypothesis that must be verified based on observed data. The massive amounts of image data produced in each examination pose an important and interesting statistical challenge...... efficient algorithms which make the analysis of large data sets feasible, and gives examples of applications....

  13. Statistical methods for ranking data

    CERN Document Server

    Alvo, Mayer

    2014-01-01

    This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.

  14. Effect of the Target Motion Sampling temperature treatment method on the statistics and performance

    International Nuclear Information System (INIS)

    Viitanen, Tuomas; Leppänen, Jaakko

    2015-01-01

    Highlights: • Use of the Target Motion Sampling (TMS) method with collision estimators is studied. • The expected values of the estimators agree with NJOY-based reference. • In most practical cases also the variances of the estimators are unaffected by TMS. • Transport calculation slow-down due to TMS dominates the impact on figures-of-merit. - Abstract: Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very

  15. A note on the kappa statistic for clustered dichotomous data.

    Science.gov (United States)

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  16. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    Energy Technology Data Exchange (ETDEWEB)

    Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

    2017-02-05

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.

  17. The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data.

    Science.gov (United States)

    Regier, Michael D; Moodie, Erica E M

    2016-05-01

    We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.

  18. Evaluation of the performance of existing non-laboratory based cardiovascular risk assessment algorithms

    Science.gov (United States)

    2013-01-01

    Background The high burden and rising incidence of cardiovascular disease (CVD) in resource constrained countries necessitates implementation of robust and pragmatic primary and secondary prevention strategies. Many current CVD management guidelines recommend absolute cardiovascular (CV) risk assessment as a clinically sound guide to preventive and treatment strategies. Development of non-laboratory based cardiovascular risk assessment algorithms enable absolute risk assessment in resource constrained countries. The objective of this review is to evaluate the performance of existing non-laboratory based CV risk assessment algorithms using the benchmarks for clinically useful CV risk assessment algorithms outlined by Cooney and colleagues. Methods A literature search to identify non-laboratory based risk prediction algorithms was performed in MEDLINE, CINAHL, Ovid Premier Nursing Journals Plus, and PubMed databases. The identified algorithms were evaluated using the benchmarks for clinically useful cardiovascular risk assessment algorithms outlined by Cooney and colleagues. Results Five non-laboratory based CV risk assessment algorithms were identified. The Gaziano and Framingham algorithms met the criteria for appropriateness of statistical methods used to derive the algorithms and endpoints. The Swedish Consultation, Framingham and Gaziano algorithms demonstrated good discrimination in derivation datasets. Only the Gaziano algorithm was externally validated where it had optimal discrimination. The Gaziano and WHO algorithms had chart formats which made them simple and user friendly for clinical application. Conclusion Both the Gaziano and Framingham non-laboratory based algorithms met most of the criteria outlined by Cooney and colleagues. External validation of the algorithms in diverse samples is needed to ascertain their performance and applicability to different populations and to enhance clinicians’ confidence in them. PMID:24373202

  19. Identifying User Profiles from Statistical Grouping Methods

    Directory of Open Access Journals (Sweden)

    Francisco Kelsen de Oliveira

    2018-02-01

    Full Text Available This research aimed to group users into subgroups according to their levels of knowledge about technology. Statistical hierarchical and non-hierarchical clustering methods were studied, compared and used in the creations of the subgroups from the similarities of the skill levels with these users’ technology. The research sample consisted of teachers who answered online questionnaires about their skills with the use of software and hardware with educational bias. The statistical methods of grouping were performed and showed the possibilities of groupings of the users. The analyses of these groups allowed to identify the common characteristics among the individuals of each subgroup. Therefore, it was possible to define two subgroups of users, one with skill in technology and another with skill with technology, so that the partial results of the research showed two main algorithms for grouping with 92% similarity in the formation of groups of users with skill with technology and the other with little skill, confirming the accuracy of the techniques of discrimination against individuals.

  20. Methods for computational disease surveillance in infection prevention and control: Statistical process control versus Twitter's anomaly and breakout detection algorithms.

    Science.gov (United States)

    Wiemken, Timothy L; Furmanek, Stephen P; Mattingly, William A; Wright, Marc-Oliver; Persaud, Annuradha K; Guinn, Brian E; Carrico, Ruth M; Arnold, Forest W; Ramirez, Julio A

    2018-02-01

    Although not all health care-associated infections (HAIs) are preventable, reducing HAIs through targeted intervention is key to a successful infection prevention program. To identify areas in need of targeted intervention, robust statistical methods must be used when analyzing surveillance data. The objective of this study was to compare and contrast statistical process control (SPC) charts with Twitter's anomaly and breakout detection algorithms. SPC and anomaly/breakout detection (ABD) charts were created for vancomycin-resistant Enterococcus, Acinetobacter baumannii, catheter-associated urinary tract infection, and central line-associated bloodstream infection data. Both SPC and ABD charts detected similar data points as anomalous/out of control on most charts. The vancomycin-resistant Enterococcus ABD chart detected an extra anomalous point that appeared to be higher than the same time period in prior years. Using a small subset of the central line-associated bloodstream infection data, the ABD chart was able to detect anomalies where the SPC chart was not. SPC charts and ABD charts both performed well, although ABD charts appeared to work better in the context of seasonal variation and autocorrelation. Because they account for common statistical issues in HAI data, ABD charts may be useful for practitioners for analysis of HAI surveillance data. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  1. Inverse Problems in Geodynamics Using Machine Learning Algorithms

    Science.gov (United States)

    Shahnas, M. H.; Yuen, D. A.; Pysklywec, R. N.

    2018-01-01

    During the past few decades numerical studies have been widely employed to explore the style of circulation and mixing in the mantle of Earth and other planets. However, in geodynamical studies there are many properties from mineral physics, geochemistry, and petrology in these numerical models. Machine learning, as a computational statistic-related technique and a subfield of artificial intelligence, has rapidly emerged recently in many fields of sciences and engineering. We focus here on the application of supervised machine learning (SML) algorithms in predictions of mantle flow processes. Specifically, we emphasize on estimating mantle properties by employing machine learning techniques in solving an inverse problem. Using snapshots of numerical convection models as training samples, we enable machine learning models to determine the magnitude of the spin transition-induced density anomalies that can cause flow stagnation at midmantle depths. Employing support vector machine algorithms, we show that SML techniques can successfully predict the magnitude of mantle density anomalies and can also be used in characterizing mantle flow patterns. The technique can be extended to more complex geodynamic problems in mantle dynamics by employing deep learning algorithms for putting constraints on properties such as viscosity, elastic parameters, and the nature of thermal and chemical anomalies.

  2. Sampling informative/complex a priori probability distributions using Gibbs sampling assisted by sequential simulation

    DEFF Research Database (Denmark)

    Hansen, Thomas Mejer; Mosegaard, Klaus; Cordua, Knud Skou

    2010-01-01

    Markov chain Monte Carlo methods such as the Gibbs sampler and the Metropolis algorithm can be used to sample the solutions to non-linear inverse problems. In principle these methods allow incorporation of arbitrarily complex a priori information, but current methods allow only relatively simple...... this algorithm with the Metropolis algorithm to obtain an efficient method for sampling posterior probability densities for nonlinear inverse problems....

  3. Reproducible cancer biomarker discovery in SELDI-TOF MS using different pre-processing algorithms.

    Directory of Open Access Journals (Sweden)

    Jinfeng Zou

    Full Text Available BACKGROUND: There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. RESULTS: In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR control approach and that the reproducibility of DE peak detection could thereby be increased. CONCLUSIONS: Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.

  4. Modified BTC Algorithm for Audio Signal Coding

    Directory of Open Access Journals (Sweden)

    TOMIC, S.

    2016-11-01

    Full Text Available This paper describes modification of a well-known image coding algorithm, named Block Truncation Coding (BTC and its application in audio signal coding. BTC algorithm was originally designed for black and white image coding. Since black and white images and audio signals have different statistical characteristics, the application of this image coding algorithm to audio signal presents a novelty and a challenge. Several implementation modifications are described in this paper, while the original idea of the algorithm is preserved. The main modifications are performed in the area of signal quantization, by designing more adequate quantizers for audio signal processing. The result is a novel audio coding algorithm, whose performance is presented and analyzed in this research. The performance analysis indicates that this novel algorithm can be successfully applied in audio signal coding.

  5. Study on the influence of X-ray tube spectral distribution on the analysis of bulk samples and thin films: Fundamental parameters method and theoretical coefficient algorithms

    International Nuclear Information System (INIS)

    Sitko, Rafal

    2008-01-01

    Knowledge of X-ray tube spectral distribution is necessary in theoretical methods of matrix correction, i.e. in both fundamental parameter (FP) methods and theoretical influence coefficient algorithms. Thus, the influence of X-ray tube distribution on the accuracy of the analysis of thin films and bulk samples is presented. The calculations are performed using experimental X-ray tube spectra taken from the literature and theoretical X-ray tube spectra evaluated by three different algorithms proposed by Pella et al. (X-Ray Spectrom. 14 (1985) 125-135), Ebel (X-Ray Spectrom. 28 (1999) 255-266), and Finkelshtein and Pavlova (X-Ray Spectrom. 28 (1999) 27-32). In this study, Fe-Cr-Ni system is selected as an example and the calculations are performed for X-ray tubes commonly applied in X-ray fluorescence analysis (XRF), i.e., Cr, Mo, Rh and W. The influence of X-ray tube spectra on FP analysis is evaluated when quantification is performed using various types of calibration samples. FP analysis of bulk samples is performed using pure-element bulk standards and multielement bulk standards similar to the analyzed material, whereas for FP analysis of thin films, the bulk and thin pure-element standards are used. For the evaluation of the influence of X-ray tube spectra on XRF analysis performed by theoretical influence coefficient methods, two algorithms for bulk samples are selected, i.e. Claisse-Quintin (Can. Spectrosc. 12 (1967) 129-134) and COLA algorithms (G.R. Lachance, Paper Presented at the International Conference on Industrial Inorganic Elemental Analysis, Metz, France, June 3, 1981) and two algorithms (constant and linear coefficients) for thin films recently proposed by Sitko (X-Ray Spectrom. 37 (2008) 265-272)

  6. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    Science.gov (United States)

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  7. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    DEFF Research Database (Denmark)

    Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona

    2015-01-01

    Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...

  8. Direction-of-Arrival Estimation Based on Sparse Recovery with Second-Order Statistics

    Directory of Open Access Journals (Sweden)

    H. Chen

    2015-04-01

    Full Text Available Traditional direction-of-arrival (DOA estimation techniques perform Nyquist-rate sampling of the received signals and as a result they require high storage. To reduce sampling ratio, we introduce level-crossing (LC sampling which captures samples whenever the signal crosses predetermined reference levels, and the LC-based analog-to-digital converter (LC ADC has been shown to efficiently sample certain classes of signals. In this paper, we focus on the DOA estimation problem by using second-order statistics based on the LC samplings recording on one sensor, along with the synchronous samplings of the another sensors, a sparse angle space scenario can be found by solving an $ell_1$ minimization problem, giving the number of sources and their DOA's. The experimental results show that our proposed method, when compared with some existing norm-based constrained optimization compressive sensing (CS algorithms, as well as subspace method, improves the DOA estimation performance, while using less samples when compared with Nyquist-rate sampling and reducing sensor activity especially for long time silence signal.

  9. Circular contour retrieval in real-world conditions by higher order statistics and an alternating-least squares algorithm

    Science.gov (United States)

    Jiang, Haiping; Marot, Julien; Fossati, Caroline; Bourennane, Salah

    2011-12-01

    In real-world conditions, contours are most often blurred in digital images because of acquisition conditions such as movement, light transmission environment, and defocus. Among image segmentation methods, Hough transform requires a computational load which increases with the number of noise pixels, level set methods also require a high computational load, and some other methods assume that the contours are one-pixel wide. For the first time, we retrieve the characteristics of multiple possibly concentric blurred circles. We face correlated noise environment, to get closer to real-world conditions. For this, we model a blurred circle by a few parameters--center coordinates, radius, and spread--which characterize its mean position and gray level variations. We derive the signal model which results from signal generation on circular antenna. Linear antennas provide the center coordinates. To retrieve the circle radii, we adapt the second-order statistics TLS-ESPRIT method for non-correlated noise environment, and propose a novel version of TLS-ESPRIT based on higher-order statistics for correlated noise environment. Then, we derive a least-squares criterion and propose an alternating least-squares algorithm to retrieve simultaneously all spread values of concentric circles. Experiments performed on hand-made and real-world images show that the proposed methods outperform the Hough transform and a level set method dedicated to blurred contours in terms of computational load. Moreover, the proposed model and optimization method provide the information of the contour grey level variations.

  10. Sampling optimization for printer characterization by direct search.

    Science.gov (United States)

    Bianco, Simone; Schettini, Raimondo

    2012-12-01

    Printer characterization usually requires many printer inputs and corresponding color measurements of the printed outputs. In this brief, a sampling optimization for printer characterization on the basis of direct search is proposed to maintain high color accuracy with a reduction in the number of characterization samples required. The proposed method is able to match a given level of color accuracy requiring, on average, a characterization set cardinality which is almost one-fourth of that required by the uniform sampling, while the best method in the state of the art needs almost one-third. The number of characterization samples required can be further reduced if the proposed algorithm is coupled with a sequential optimization method that refines the sample values in the device-independent color space. The proposed sampling optimization method is extended to deal with multiple substrates simultaneously, giving statistically better colorimetric accuracy (at the α = 0.05 significance level) than sampling optimization techniques in the state of the art optimized for each individual substrate, thus allowing use of a single set of characterization samples for multiple substrates.

  11. Statistical learning methods: Basics, control and performance

    Energy Technology Data Exchange (ETDEWEB)

    Zimmermann, J. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de

    2006-04-01

    The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms.

  12. Statistical learning methods: Basics, control and performance

    International Nuclear Information System (INIS)

    Zimmermann, J.

    2006-01-01

    The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms

  13. A Statistical Primer: Understanding Descriptive and Inferential Statistics

    OpenAIRE

    Gillian Byrne

    2007-01-01

    As libraries and librarians move more towards evidence‐based decision making, the data being generated in libraries is growing. Understanding the basics of statistical analysis is crucial for evidence‐based practice (EBP), in order to correctly design and analyze researchas well as to evaluate the research of others. This article covers the fundamentals of descriptive and inferential statistics, from hypothesis construction to sampling to common statistical techniques including chi‐square, co...

  14. Statistical learning methods in high-energy and astrophysics analysis

    Energy Technology Data Exchange (ETDEWEB)

    Zimmermann, J. [Forschungszentrum Juelich GmbH, Zentrallabor fuer Elektronik, 52425 Juelich (Germany) and Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de; Kiesling, C. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)

    2004-11-21

    We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application.

  15. Statistical learning methods in high-energy and astrophysics analysis

    International Nuclear Information System (INIS)

    Zimmermann, J.; Kiesling, C.

    2004-01-01

    We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application

  16. Sampling strategies and stopping criteria for stochastic dual dynamic programming: a case study in long-term hydrothermal scheduling

    Energy Technology Data Exchange (ETDEWEB)

    Homem-de-Mello, Tito [University of Illinois at Chicago, Department of Mechanical and Industrial Engineering, Chicago, IL (United States); Matos, Vitor L. de; Finardi, Erlon C. [Universidade Federal de Santa Catarina, LabPlan - Laboratorio de Planejamento de Sistemas de Energia Eletrica, Florianopolis (Brazil)

    2011-03-15

    The long-term hydrothermal scheduling is one of the most important problems to be solved in the power systems area. This problem aims to obtain an optimal policy, under water (energy) resources uncertainty, for hydro and thermal plants over a multi-annual planning horizon. It is natural to model the problem as a multi-stage stochastic program, a class of models for which algorithms have been developed. The original stochastic process is represented by a finite scenario tree and, because of the large number of stages, a sampling-based method such as the Stochastic Dual Dynamic Programming (SDDP) algorithm is required. The purpose of this paper is two-fold. Firstly, we study the application of two alternative sampling strategies to the standard Monte Carlo - namely, Latin hypercube sampling and randomized quasi-Monte Carlo - for the generation of scenario trees, as well as for the sampling of scenarios that is part of the SDDP algorithm. Secondly, we discuss the formulation of stopping criteria for the optimization algorithm in terms of statistical hypothesis tests, which allows us to propose an alternative criterion that is more robust than that originally proposed for the SDDP. We test these ideas on a problem associated with the whole Brazilian power system, with a three-year planning horizon. (orig.)

  17. Statistical properties of the surface velocity field in the northern Gulf of Mexico sampled by GLAD drifters

    OpenAIRE

    Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P

    2016-01-01

    The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the velocity field and its statistical parameters are non-Gaussian; most are multimodal. The dominant periods for the surface velocity field are 1–2 days due to inertial oscillations, tides, and the sea b...

  18. International Conference on Robust Statistics

    CERN Document Server

    Filzmoser, Peter; Gather, Ursula; Rousseeuw, Peter

    2003-01-01

    Aspects of Robust Statistics are important in many areas. Based on the International Conference on Robust Statistics 2001 (ICORS 2001) in Vorau, Austria, this volume discusses future directions of the discipline, bringing together leading scientists, experienced researchers and practitioners, as well as younger researchers. The papers cover a multitude of different aspects of Robust Statistics. For instance, the fundamental problem of data summary (weights of evidence) is considered and its robustness properties are studied. Further theoretical subjects include e.g.: robust methods for skewness, time series, longitudinal data, multivariate methods, and tests. Some papers deal with computational aspects and algorithms. Finally, the aspects of application and programming tools complete the volume.

  19. Relationship between accuracy and number of samples on statistical quantity and contour map of environmental gamma-ray dose rate. Example of random sampling

    International Nuclear Information System (INIS)

    Matsuda, Hideharu; Minato, Susumu

    2002-01-01

    The accuracy of statistical quantity like the mean value and contour map obtained by measurement of the environmental gamma-ray dose rate was evaluated by random sampling of 5 different model distribution maps made by the mean slope, -1.3, of power spectra calculated from the actually measured values. The values were derived from 58 natural gamma dose rate data reported worldwide ranging in the means of 10-100 Gy/h rates and 10 -3 -10 7 km 2 areas. The accuracy of the mean value was found around ±7% even for 60 or 80 samplings (the most frequent number) and the standard deviation had the accuracy less than 1/4-1/3 of the means. The correlation coefficient of the frequency distribution was found 0.860 or more for 200-400 samplings (the most frequent number) but of the contour map, 0.502-0.770. (K.H.)

  20. Distinguish Dynamic Basic Blocks by Structural Statistical Testing

    DEFF Research Database (Denmark)

    Petit, Matthieu; Gotlieb, Arnaud

    Statistical testing aims at generating random test data that respect selected probabilistic properties. A distribution probability is associated with the program input space in order to achieve statistical test purpose: to test the most frequent usage of software or to maximize the probability of...... control flow path) during the test data selection. We implemented this algorithm in a statistical test data generator for Java programs. A first experimental validation is presented...

  1. Hardware architecture for projective model calculation and false match refining using random sample consensus algorithm

    Science.gov (United States)

    Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid

    2016-11-01

    The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.

  2. Algorithms for Cytoplasm Segmentation of Fluorescence Labelled Cells

    Directory of Open Access Journals (Sweden)

    Carolina Wählby

    2002-01-01

    Full Text Available Automatic cell segmentation has various applications in cytometry, and while the nucleus is often very distinct and easy to identify, the cytoplasm provides a lot more challenge. A new combination of image analysis algorithms for segmentation of cells imaged by fluorescence microscopy is presented. The algorithm consists of an image pre‐processing step, a general segmentation and merging step followed by a segmentation quality measurement. The quality measurement consists of a statistical analysis of a number of shape descriptive features. Objects that have features that differ to that of correctly segmented single cells can be further processed by a splitting step. By statistical analysis we therefore get a feedback system for separation of clustered cells. After the segmentation is completed, the quality of the final segmentation is evaluated. By training the algorithm on a representative set of training images, the algorithm is made fully automatic for subsequent images created under similar conditions. Automatic cytoplasm segmentation was tested on CHO‐cells stained with calcein. The fully automatic method showed between 89% and 97% correct segmentation as compared to manual segmentation.

  3. SU-G-JeP1-12: Head-To-Head Performance Characterization of Two Multileaf Collimator Tracking Algorithms for Radiotherapy

    International Nuclear Information System (INIS)

    Caillet, V; Colvill, E; O’Brien, R; Keall, P; Poulsen, P; Moore, D; Booth, J; Sawant, A

    2016-01-01

    Purpose: Multi-leaf collimator (MLC) tracking is being clinically pioneered to continuously compensate for thoracic and abdominal motion during radiotherapy. The purpose of this work is to characterize the performance of two MLC tracking algorithms for cancer radiotherapy, based on a direct optimization and a piecewise leaf fitting approach respectively. Methods: To test the algorithms, both physical and in silico experiments were performed. Previously published high and low modulation VMAT plans for lung and prostate cancer cases were used along with eight patient-measured organ-specific trajectories. For both MLC tracking algorithm, the plans were run with their corresponding patient trajectories. The physical experiments were performed on a Trilogy Varian linac and a programmable phantom (HexaMotion platform). For each MLC tracking algorithm, plan and patient trajectory, the tracking accuracy was quantified as the difference in aperture area between ideal and fitted MLC. To compare algorithms, the average cumulative tracking error area for each experiment was calculated. The two-sample Kolmogorov-Smirnov (KS) test was used to evaluate the cumulative tracking errors between algorithms. Results: Comparison of tracking errors for the physical and in silico experiments showed minor differences between the two algorithms. The KS D-statistics for the physical experiments were below 0.05 denoting no significant differences between the two distributions pattern and the average error area (direct optimization/piecewise leaf-fitting) were comparable (66.64 cm2/65.65 cm2). For the in silico experiments, the KS D-statistics were below 0.05 and the average errors area were also equivalent (49.38 cm2/48.98 cm2). Conclusion: The comparison between the two leaf fittings algorithms demonstrated no significant differences in tracking errors, neither in a clinically realistic environment nor in silico. The similarities in the two independent algorithms give confidence in the use

  4. SU-G-JeP1-12: Head-To-Head Performance Characterization of Two Multileaf Collimator Tracking Algorithms for Radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Caillet, V; Colvill, E [School of Medecine, The University of Sydney, Sydney, NSW (Australia); Royal North Shore Hospital, St Leonards, Sydney (Australia); O’Brien, R; Keall, P [School of Medecine, The University of Sydney, Sydney, NSW (Australia); Poulsen, P [Aarhus University Hospital, Aarhus (Denmark); Moore, D [UT Southwestern Medical Center, Dallas, TX (United States); University of Maryland School of Medicine, Baltimore, MD (United States); Booth, J [Royal North Shore Hospital, St Leonards, Sydney (Australia); Sawant, A [University of Maryland School of Medicine, Baltimore, MD (United States)

    2016-06-15

    Purpose: Multi-leaf collimator (MLC) tracking is being clinically pioneered to continuously compensate for thoracic and abdominal motion during radiotherapy. The purpose of this work is to characterize the performance of two MLC tracking algorithms for cancer radiotherapy, based on a direct optimization and a piecewise leaf fitting approach respectively. Methods: To test the algorithms, both physical and in silico experiments were performed. Previously published high and low modulation VMAT plans for lung and prostate cancer cases were used along with eight patient-measured organ-specific trajectories. For both MLC tracking algorithm, the plans were run with their corresponding patient trajectories. The physical experiments were performed on a Trilogy Varian linac and a programmable phantom (HexaMotion platform). For each MLC tracking algorithm, plan and patient trajectory, the tracking accuracy was quantified as the difference in aperture area between ideal and fitted MLC. To compare algorithms, the average cumulative tracking error area for each experiment was calculated. The two-sample Kolmogorov-Smirnov (KS) test was used to evaluate the cumulative tracking errors between algorithms. Results: Comparison of tracking errors for the physical and in silico experiments showed minor differences between the two algorithms. The KS D-statistics for the physical experiments were below 0.05 denoting no significant differences between the two distributions pattern and the average error area (direct optimization/piecewise leaf-fitting) were comparable (66.64 cm2/65.65 cm2). For the in silico experiments, the KS D-statistics were below 0.05 and the average errors area were also equivalent (49.38 cm2/48.98 cm2). Conclusion: The comparison between the two leaf fittings algorithms demonstrated no significant differences in tracking errors, neither in a clinically realistic environment nor in silico. The similarities in the two independent algorithms give confidence in the use

  5. Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

    Science.gov (United States)

    Breunig, Nancy A.

    Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…

  6. Understanding Computational Bayesian Statistics

    CERN Document Server

    Bolstad, William M

    2011-01-01

    A hands-on introduction to computational statistics from a Bayesian point of view Providing a solid grounding in statistics while uniquely covering the topics from a Bayesian perspective, Understanding Computational Bayesian Statistics successfully guides readers through this new, cutting-edge approach. With its hands-on treatment of the topic, the book shows how samples can be drawn from the posterior distribution when the formula giving its shape is all that is known, and how Bayesian inferences can be based on these samples from the posterior. These ideas are illustrated on common statistic

  7. Super resolution reconstruction of μ-CT image of rock sample using neighbour embedding algorithm

    Science.gov (United States)

    Wang, Yuzhu; Rahman, Sheik S.; Arns, Christoph H.

    2018-03-01

    X-ray computed tomography (μ-CT) is considered to be the most effective way to obtain the inner structure of rock sample without destructions. However, its limited resolution hampers its ability to probe sub-micro structures which is critical for flow transportation of rock sample. In this study, we propose an innovative methodology to improve the resolution of μ-CT image using neighbour embedding algorithm where low frequency information is provided by μ-CT image itself while high frequency information is supplemented by high resolution scanning electron microscopy (SEM) image. In order to obtain prior for reconstruction, a large number of image patch pairs contain high- and low- image patches are extracted from the Gaussian image pyramid generated by SEM image. These image patch pairs contain abundant information about tomographic evolution of local porous structures under different resolution spaces. Relying on the assumption of self-similarity of porous structure, this prior information can be used to supervise the reconstruction of high resolution μ-CT image effectively. The experimental results show that the proposed method is able to achieve the state-of-the-art performance.

  8. Computed Tomography Image Quality Evaluation of a New Iterative Reconstruction Algorithm in the Abdomen (Adaptive Statistical Iterative Reconstruction-V) a Comparison With Model-Based Iterative Reconstruction, Adaptive Statistical Iterative Reconstruction, and Filtered Back Projection Reconstructions.

    Science.gov (United States)

    Goodenberger, Martin H; Wagner-Bartak, Nicolaus A; Gupta, Shiva; Liu, Xinming; Yap, Ramon Q; Sun, Jia; Tamm, Eric P; Jensen, Corey T

    The purpose of this study was to compare abdominopelvic computed tomography images reconstructed with adaptive statistical iterative reconstruction-V (ASIR-V) with model-based iterative reconstruction (Veo 3.0), ASIR, and filtered back projection (FBP). Abdominopelvic computed tomography scans for 36 patients (26 males and 10 females) were reconstructed using FBP, ASIR (80%), Veo 3.0, and ASIR-V (30%, 60%, 90%). Mean ± SD patient age was 32 ± 10 years with mean ± SD body mass index of 26.9 ± 4.4 kg/m. Images were reviewed by 2 independent readers in a blinded, randomized fashion. Hounsfield unit, noise, and contrast-to-noise ratio (CNR) values were calculated for each reconstruction algorithm for further comparison. Phantom evaluation of low-contrast detectability (LCD) and high-contrast resolution was performed. Adaptive statistical iterative reconstruction-V 30%, ASIR-V 60%, and ASIR 80% were generally superior qualitatively compared with ASIR-V 90%, Veo 3.0, and FBP (P ASIR-V 60% with respective CNR values of 5.54 ± 2.39, 8.78 ± 3.15, and 3.49 ± 1.77 (P ASIR 80% had the best and worst spatial resolution, respectively. Adaptive statistical iterative reconstruction-V 30% and ASIR-V 60% provided the best combination of qualitative and quantitative performance. Adaptive statistical iterative reconstruction 80% was equivalent qualitatively, but demonstrated inferior spatial resolution and LCD.

  9. General Algorithm (High level)

    Indian Academy of Sciences (India)

    First page Back Continue Last page Overview Graphics. General Algorithm (High level). Iteratively. Use Tightness Property to remove points of P1,..,Pi. Use random sampling to get a Random Sample (of enough points) from the next largest cluster, Pi+1. Use the Random Sampling Procedure to approximate ci+1 using the ...

  10. Genetic Algorithm Applied to the Eigenvalue Equalization Filtered-x LMS Algorithm (EE-FXLMS

    Directory of Open Access Journals (Sweden)

    Stephan P. Lovstedt

    2008-01-01

    Full Text Available The FXLMS algorithm, used extensively in active noise control (ANC, exhibits frequency-dependent convergence behavior. This leads to degraded performance for time-varying tonal noise and noise with multiple stationary tones. Previous work by the authors proposed the eigenvalue equalization filtered-x least mean squares (EE-FXLMS algorithm. For that algorithm, magnitude coefficients of the secondary path transfer function are modified to decrease variation in the eigenvalues of the filtered-x autocorrelation matrix, while preserving the phase, giving faster convergence and increasing overall attenuation. This paper revisits the EE-FXLMS algorithm, using a genetic algorithm to find magnitude coefficients that give the least variation in eigenvalues. This method overcomes some of the problems with implementing the EE-FXLMS algorithm arising from finite resolution of sampled systems. Experimental control results using the original secondary path model, and a modified secondary path model for both the previous implementation of EE-FXLMS and the genetic algorithm implementation are compared.

  11. Algorithm for Spatial Clustering with Obstacles

    OpenAIRE

    El-Sharkawi, Mohamed E.; El-Zawawy, Mohamed A.

    2009-01-01

    In this paper, we propose an efficient clustering technique to solve the problem of clustering in the presence of obstacles. The proposed algorithm divides the spatial area into rectangular cells. Each cell is associated with statistical information that enables us to label the cell as dense or non-dense. We also label each cell as obstructed (i.e. intersects any obstacle) or non-obstructed. Then the algorithm finds the regions (clusters) of connected, dense, non-obstructed cells. Finally, th...

  12. An Automated Energy Detection Algorithm Based on Kurtosis-Histogram Excision

    Science.gov (United States)

    2018-01-01

    10 kHz, 100 kHz, 1 MHz 100 MHz–1 GHz 1 100 kHz 3. Statistical Processing 3.1 Statistical Analysis Statistical analysis is the mathematical ...quantitative terms. In commercial prognostics and diagnostic vibrational monitoring applications , statistical techniques that are mainly used for alarm...applying statistical processing techniques to the energy detection scenario of signals in the RF spectrum domain. The algorithm was developed after

  13. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  14. Statistical significance of cis-regulatory modules

    Directory of Open Access Journals (Sweden)

    Smith Andrew D

    2007-01-01

    Full Text Available Abstract Background It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. Results We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites. Conclusion The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM and MODSTORM software.

  15. THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS

    International Nuclear Information System (INIS)

    Xu, Bingxiao; Zheng, Wei; Postman, Marc; Bradley, Larry; Meneghetti, Massimo; Koekemoer, Anton; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Vega, Jesus

    2016-01-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z s = 1.9 with 33% of the detected arcs having z s  > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations

  16. The Detection and Statistics of Giant Arcs behind CLASH Clusters

    Science.gov (United States)

    Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton

    2016-02-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.

  17. Detecting chaos in irregularly sampled time series.

    Science.gov (United States)

    Kulp, C W

    2013-09-01

    Recently, Wiebe and Virgin [Chaos 22, 013136 (2012)] developed an algorithm which detects chaos by analyzing a time series' power spectrum which is computed using the Discrete Fourier Transform (DFT). Their algorithm, like other time series characterization algorithms, requires that the time series be regularly sampled. Real-world data, however, are often irregularly sampled, thus, making the detection of chaotic behavior difficult or impossible with those methods. In this paper, a characterization algorithm is presented, which effectively detects chaos in irregularly sampled time series. The work presented here is a modification of Wiebe and Virgin's algorithm and uses the Lomb-Scargle Periodogram (LSP) to compute a series' power spectrum instead of the DFT. The DFT is not appropriate for irregularly sampled time series. However, the LSP is capable of computing the frequency content of irregularly sampled data. Furthermore, a new method of analyzing the power spectrum is developed, which can be useful for differentiating between chaotic and non-chaotic behavior. The new characterization algorithm is successfully applied to irregularly sampled data generated by a model as well as data consisting of observations of variable stars.

  18. Optimization of the p-xylene oxidation process by a multi-objective differential evolution algorithm with adaptive parameters co-derived with the population-based incremental learning algorithm

    Science.gov (United States)

    Guo, Zhan; Yan, Xuefeng

    2018-04-01

    Different operating conditions of p-xylene oxidation have different influences on the product, purified terephthalic acid. It is necessary to obtain the optimal combination of reaction conditions to ensure the quality of the products, cut down on consumption and increase revenues. A multi-objective differential evolution (MODE) algorithm co-evolved with the population-based incremental learning (PBIL) algorithm, called PBMODE, is proposed. The PBMODE algorithm was designed as a co-evolutionary system. Each individual has its own parameter individual, which is co-evolved by PBIL. PBIL uses statistical analysis to build a model based on the corresponding symbiotic individuals of the superior original individuals during the main evolutionary process. The results of simulations and statistical analysis indicate that the overall performance of the PBMODE algorithm is better than that of the compared algorithms and it can be used to optimize the operating conditions of the p-xylene oxidation process effectively and efficiently.

  19. Adaptive Metropolis Sampling with Product Distributions

    Science.gov (United States)

    Wolpert, David H.; Lee, Chiu Fan

    2005-01-01

    The Metropolis-Hastings (MH) algorithm is a way to sample a provided target distribution pi(z). It works by repeatedly sampling a separate proposal distribution T(x,x') to generate a random walk {x(t)}. We consider a modification of the MH algorithm in which T is dynamically updated during the walk. The update at time t uses the {x(t' less than t)} to estimate the product distribution that has the least Kullback-Leibler distance to pi. That estimate is the information-theoretically optimal mean-field approximation to pi. We demonstrate through computer experiments that our algorithm produces samples that are superior to those of the conventional MH algorithm.

  20. [A comparison of convenience sampling and purposive sampling].

    Science.gov (United States)

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  1. Statistical mechanics of the vertex-cover problem

    Science.gov (United States)

    Hartmann, Alexander K.; Weigt, Martin

    2003-10-01

    We review recent progress in the study of the vertex-cover problem (VC). The VC belongs to the class of NP-complete graph theoretical problems, which plays a central role in theoretical computer science. On ensembles of random graphs, VC exhibits a coverable-uncoverable phase transition. Very close to this transition, depending on the solution algorithm, easy-hard transitions in the typical running time of the algorithms occur. We explain a statistical mechanics approach, which works by mapping the VC to a hard-core lattice gas, and then applying techniques such as the replica trick or the cavity approach. Using these methods, the phase diagram of the VC could be obtained exactly for connectivities c e, the solution of the VC exhibits full replica symmetry breaking. The statistical mechanics approach can also be used to study analytically the typical running time of simple complete and incomplete algorithms for the VC. Finally, we describe recent results for the VC when studied on other ensembles of finite- and infinite-dimensional graphs.

  2. Statistical mechanics of the vertex-cover problem

    International Nuclear Information System (INIS)

    Hartmann, Alexander K; Weigt, Martin

    2003-01-01

    We review recent progress in the study of the vertex-cover problem (VC). The VC belongs to the class of NP-complete graph theoretical problems, which plays a central role in theoretical computer science. On ensembles of random graphs, VC exhibits a coverable-uncoverable phase transition. Very close to this transition, depending on the solution algorithm, easy-hard transitions in the typical running time of the algorithms occur. We explain a statistical mechanics approach, which works by mapping the VC to a hard-core lattice gas, and then applying techniques such as the replica trick or the cavity approach. Using these methods, the phase diagram of the VC could be obtained exactly for connectivities c e, the solution of the VC exhibits full replica symmetry breaking. The statistical mechanics approach can also be used to study analytically the typical running time of simple complete and incomplete algorithms for the VC. Finally, we describe recent results for the VC when studied on other ensembles of finite- and infinite-dimensional graphs

  3. Validation of Correction Algorithms for Near-IR Analysis of Human Milk in an Independent Sample Set-Effect of Pasteurization.

    Science.gov (United States)

    Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph

    2016-02-26

    Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified.

  4. Symmetry and Algorithmic Complexity of Polyominoes and Polyhedral Graphs

    KAUST Repository

    Zenil, Hector

    2018-02-24

    We introduce a definition of algorithmic symmetry able to capture essential aspects of geometric symmetry. We review, study and apply a method for approximating the algorithmic complexity (also known as Kolmogorov-Chaitin complexity) of graphs and networks based on the concept of Algorithmic Probability (AP). AP is a concept (and method) capable of recursively enumeration all properties of computable (causal) nature beyond statistical regularities. We explore the connections of algorithmic complexity---both theoretical and numerical---with geometric properties mainly symmetry and topology from an (algorithmic) information-theoretic perspective. We show that approximations to algorithmic complexity by lossless compression and an Algorithmic Probability-based method can characterize properties of polyominoes, polytopes, regular and quasi-regular polyhedra as well as polyhedral networks, thereby demonstrating its profiling capabilities.

  5. Symmetry and Algorithmic Complexity of Polyominoes and Polyhedral Graphs

    KAUST Repository

    Zenil, Hector; Kiani, Narsis A.; Tegner, Jesper

    2018-01-01

    We introduce a definition of algorithmic symmetry able to capture essential aspects of geometric symmetry. We review, study and apply a method for approximating the algorithmic complexity (also known as Kolmogorov-Chaitin complexity) of graphs and networks based on the concept of Algorithmic Probability (AP). AP is a concept (and method) capable of recursively enumeration all properties of computable (causal) nature beyond statistical regularities. We explore the connections of algorithmic complexity---both theoretical and numerical---with geometric properties mainly symmetry and topology from an (algorithmic) information-theoretic perspective. We show that approximations to algorithmic complexity by lossless compression and an Algorithmic Probability-based method can characterize properties of polyominoes, polytopes, regular and quasi-regular polyhedra as well as polyhedral networks, thereby demonstrating its profiling capabilities.

  6. Improving Polyp Detection Algorithms for CT Colonography: Pareto Front Approach.

    Science.gov (United States)

    Huang, Adam; Li, Jiang; Summers, Ronald M; Petrick, Nicholas; Hara, Amy K

    2010-03-21

    We investigated a Pareto front approach to improving polyp detection algorithms for CT colonography (CTC). A dataset of 56 CTC colon surfaces with 87 proven positive detections of 53 polyps sized 4 to 60 mm was used to evaluate the performance of a one-step and a two-step curvature-based region growing algorithm. The algorithmic performance was statistically evaluated and compared based on the Pareto optimal solutions from 20 experiments by evolutionary algorithms. The false positive rate was lower (pPareto optimization process can effectively help in fine-tuning and redesigning polyp detection algorithms.

  7. Statistically accurate low-order models for uncertainty quantification in turbulent dynamical systems.

    Science.gov (United States)

    Sapsis, Themistoklis P; Majda, Andrew J

    2013-08-20

    A framework for low-order predictive statistical modeling and uncertainty quantification in turbulent dynamical systems is developed here. These reduced-order, modified quasilinear Gaussian (ROMQG) algorithms apply to turbulent dynamical systems in which there is significant linear instability or linear nonnormal dynamics in the unperturbed system and energy-conserving nonlinear interactions that transfer energy from the unstable modes to the stable modes where dissipation occurs, resulting in a statistical steady state; such turbulent dynamical systems are ubiquitous in geophysical and engineering turbulence. The ROMQG method involves constructing a low-order, nonlinear, dynamical system for the mean and covariance statistics in the reduced subspace that has the unperturbed statistics as a stable fixed point and optimally incorporates the indirect effect of non-Gaussian third-order statistics for the unperturbed system in a systematic calibration stage. This calibration procedure is achieved through information involving only the mean and covariance statistics for the unperturbed equilibrium. The performance of the ROMQG algorithm is assessed on two stringent test cases: the 40-mode Lorenz 96 model mimicking midlatitude atmospheric turbulence and two-layer baroclinic models for high-latitude ocean turbulence with over 125,000 degrees of freedom. In the Lorenz 96 model, the ROMQG algorithm with just a single mode captures the transient response to random or deterministic forcing. For the baroclinic ocean turbulence models, the inexpensive ROMQG algorithm with 252 modes, less than 0.2% of the total, captures the nonlinear response of the energy, the heat flux, and even the one-dimensional energy and heat flux spectra.

  8. Heuristics of the algorithm: Big Data, user interpretation and institutional translation

    Directory of Open Access Journals (Sweden)

    Göran Bolin

    2015-10-01

    Full Text Available Intelligence on mass media audiences was founded on representative statistical samples, analysed by statisticians at the market departments of media corporations. The techniques for aggregating user data in the age of pervasive and ubiquitous personal media (e.g. laptops, smartphones, credit cards/swipe cards and radio-frequency identification build on large aggregates of information (Big Data analysed by algorithms that transform data into commodities. While the former technologies were built on socio-economic variables such as age, gender, ethnicity, education, media preferences (i.e. categories recognisable to media users and industry representatives alike, Big Data technologies register consumer choice, geographical position, web movement, and behavioural information in technologically complex ways that for most lay people are too abstract to appreciate the full consequences of. The data mined for pattern recognition privileges relational rather than demographic qualities. We argue that the agency of interpretation at the bottom of market decisions within media companies nevertheless introduces a ‘heuristics of the algorithm’, where the data inevitably becomes translated into social categories. In the paper we argue that although the promise of algorithmically generated data is often implemented in automated systems where human agency gets increasingly distanced from the data collected (it is our technological gadgets that are being surveyed, rather than us as social beings, one can observe a felt need among media users and among industry actors to ‘translate back’ the algorithmically produced relational statistics into ‘traditional’ social parameters. The tenacious social structures within the advertising industries work against the techno-economically driven tendencies within the Big Data economy.

  9. A weighted least-squares lump correction algorithm for transmission-corrected gamma-ray nondestructive assay

    International Nuclear Information System (INIS)

    Prettyman, T.H.; Sprinkle, J.K. Jr.; Sheppard, G.A.

    1993-01-01

    With transmission-corrected gamma-ray nondestructive assay instruments such as the Segmented Gamma Scanner (SGS) and the Tomographic Gamma Scanner (TGS) that is currently under development at Los Alamos National Laboratory, the amount of gamma-ray emitting material can be underestimated for samples in which the emitting material consists of particles or lumps of highly attenuating material. This problem is encountered in the assay of uranium and plutonium-bearing samples. To correct for this source of bias, we have developed a least-squares algorithm that uses transmission-corrected assay results for several emitted energies and a weighting function to account for statistical uncertainties in the assay results. The variation of effective lump size in the fitted model is parameterized; this allows the correction to be performed for a wide range of lump-size distributions. It may be possible to use the reduced chi-squared value obtained in the fit to identify samples in which assay assumptions have been violated. We found that the algorithm significantly reduced bias in simulated assays and improved SGS assay results for plutonium-bearing samples. Further testing will be conducted with the TGS, which is expected to be less susceptible than the SGS to systematic source of bias

  10. Robust Multi-Frame Adaptive Optics Image Restoration Algorithm Using Maximum Likelihood Estimation with Poisson Statistics

    Directory of Open Access Journals (Sweden)

    Dongming Li

    2017-04-01

    Full Text Available An adaptive optics (AO system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods.

  11. Statistical properties of the coarse-grained velocity gradient tensor in turbulence: Monte-Carlo simulations of the tetrad model

    International Nuclear Information System (INIS)

    Pumir, Alain; Naso, Aurore

    2010-01-01

    A proper description of the velocity gradient tensor is crucial for understanding the dynamics of turbulent flows, in particular the energy transfer from large to small scales. Insight into the statistical properties of the velocity gradient tensor and into its coarse-grained generalization can be obtained with the help of a stochastic 'tetrad model' that describes the coarse-grained velocity gradient tensor based on the evolution of four points. Although the solution of the stochastic model can be formally expressed in terms of path integrals, its numerical determination in terms of the Monte-Carlo method is very challenging, as very few configurations contribute effectively to the statistical weight. Here, we discuss a strategy that allows us to solve the tetrad model numerically. The algorithm is based on the importance sampling method, which consists here of identifying and sampling preferentially the configurations that are likely to correspond to a large statistical weight, and selectively rejecting configurations with a small statistical weight. The algorithm leads to an efficient numerical determination of the solutions of the model and allows us to determine their qualitative behavior as a function of scale. We find that the moments of order n≤4 of the solutions of the model scale with the coarse-graining scale and that the scaling exponents are very close to the predictions of the Kolmogorov theory. The model qualitatively reproduces quite well the statistics concerning the local structure of the flow. However, we find that the model generally tends to predict an excess of strain compared to vorticity. Thus, our results show that while some physical aspects are not fully captured by the model, our approach leads to a very good description of several important qualitative properties of real turbulent flows.

  12. Generalized t-statistic for two-group classification.

    Science.gov (United States)

    Komori, Osamu; Eguchi, Shinto; Copas, John B

    2015-06-01

    In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.

  13. Calculation of absolute protein-ligand binding free energy using distributed replica sampling.

    Science.gov (United States)

    Rodinger, Tomas; Howell, P Lynne; Pomès, Régis

    2008-10-21

    Distributed replica sampling [T. Rodinger et al., J. Chem. Theory Comput. 2, 725 (2006)] is a simple and general scheme for Boltzmann sampling of conformational space by computer simulation in which multiple replicas of the system undergo a random walk in reaction coordinate or temperature space. Individual replicas are linked through a generalized Hamiltonian containing an extra potential energy term or bias which depends on the distribution of all replicas, thus enforcing the desired sampling distribution along the coordinate or parameter of interest regardless of free energy barriers. In contrast to replica exchange methods, efficient implementation of the algorithm does not require synchronicity of the individual simulations. The algorithm is inherently suited for large-scale simulations using shared or heterogeneous computing platforms such as a distributed network. In this work, we build on our original algorithm by introducing Boltzmann-weighted jumping, which allows moves of a larger magnitude and thus enhances sampling efficiency along the reaction coordinate. The approach is demonstrated using a realistic and biologically relevant application; we calculate the standard binding free energy of benzene to the L99A mutant of T4 lysozyme. Distributed replica sampling is used in conjunction with thermodynamic integration to compute the potential of mean force for extracting the ligand from protein and solvent along a nonphysical spatial coordinate. Dynamic treatment of the reaction coordinate leads to faster statistical convergence of the potential of mean force than a conventional static coordinate, which suffers from slow transitions on a rugged potential energy surface.

  14. Fault detection and isolation in GPS receiver autonomous integrity monitoring based on chaos particle swarm optimization-particle filter algorithm

    Science.gov (United States)

    Wang, Ershen; Jia, Chaoying; Tong, Gang; Qu, Pingping; Lan, Xiaoyu; Pang, Tao

    2018-03-01

    The receiver autonomous integrity monitoring (RAIM) is one of the most important parts in an avionic navigation system. Two problems need to be addressed to improve this system, namely, the degeneracy phenomenon and lack of samples for the standard particle filter (PF). However, the number of samples cannot adequately express the real distribution of the probability density function (i.e., sample impoverishment). This study presents a GPS receiver autonomous integrity monitoring (RAIM) method based on a chaos particle swarm optimization particle filter (CPSO-PF) algorithm with a log likelihood ratio. The chaos sequence generates a set of chaotic variables, which are mapped to the interval of optimization variables to improve particle quality. This chaos perturbation overcomes the potential for the search to become trapped in a local optimum in the particle swarm optimization (PSO) algorithm. Test statistics are configured based on a likelihood ratio, and satellite fault detection is then conducted by checking the consistency between the state estimate of the main PF and those of the auxiliary PFs. Based on GPS data, the experimental results demonstrate that the proposed algorithm can effectively detect and isolate satellite faults under conditions of non-Gaussian measurement noise. Moreover, the performance of the proposed novel method is better than that of RAIM based on the PF or PSO-PF algorithm.

  15. On Invertible Sampling and Adaptive Security

    DEFF Research Database (Denmark)

    Ishai, Yuval; Kumarasubramanian, Abishek; Orlandi, Claudio

    2011-01-01

    functionalities was left open. We provide the first convincing evidence that the answer to this question is negative, namely that some (randomized) functionalities cannot be realized with adaptive security. We obtain this result by studying the following related invertible sampling problem: given an efficient...... sampling algorithm A, obtain another sampling algorithm B such that the output of B is computationally indistinguishable from the output of A, but B can be efficiently inverted (even if A cannot). This invertible sampling problem is independently motivated by other cryptographic applications. We show......, under strong but well studied assumptions, that there exist efficient sampling algorithms A for which invertible sampling as above is impossible. At the same time, we show that a general feasibility result for adaptively secure MPC implies that invertible sampling is possible for every A, thereby...

  16. Optimizing Groundwater Monitoring Networks Using Integrated Statistical and Geostatistical Approaches

    Directory of Open Access Journals (Sweden)

    Jay Krishna Thakur

    2015-08-01

    Full Text Available The aim of this work is to investigate new approaches using methods based on statistics and geo-statistics for spatio-temporal optimization of groundwater monitoring networks. The formulated and integrated methods were tested with the groundwater quality data set of Bitterfeld/Wolfen, Germany. Spatially, the monitoring network was optimized using geo-statistical methods. Temporal optimization of the monitoring network was carried out using Sen’s method (1968. For geostatistical network optimization, a geostatistical spatio-temporal algorithm was used to identify redundant wells in 2- and 2.5-D Quaternary and Tertiary aquifers. Influences of interpolation block width, dimension, contaminant association, groundwater flow direction and aquifer homogeneity on statistical and geostatistical methods for monitoring network optimization were analysed. The integrated approach shows 37% and 28% redundancies in the monitoring network in Quaternary aquifer and Tertiary aquifer respectively. The geostatistical method also recommends 41 and 22 new monitoring wells in the Quaternary and Tertiary aquifers respectively. In temporal optimization, an overall optimized sampling interval was recommended in terms of lower quartile (238 days, median quartile (317 days and upper quartile (401 days in the research area of Bitterfeld/Wolfen. Demonstrated methods for improving groundwater monitoring network can be used in real monitoring network optimization with due consideration given to influencing factors.

  17. Computational statistics handbook with Matlab

    CERN Document Server

    Martinez, Wendy L

    2007-01-01

    Prefaces Introduction What Is Computational Statistics? An Overview of the Book Probability Concepts Introduction Probability Conditional Probability and Independence Expectation Common Distributions Sampling Concepts Introduction Sampling Terminology and Concepts Sampling Distributions Parameter Estimation Empirical Distribution Function Generating Random Variables Introduction General Techniques for Generating Random Variables Generating Continuous Random Variables Generating Discrete Random Variables Exploratory Data Analysis Introduction Exploring Univariate Data Exploring Bivariate and Trivariate Data Exploring Multidimensional Data Finding Structure Introduction Projecting Data Principal Component Analysis Projection Pursuit EDA Independent Component Analysis Grand Tour Nonlinear Dimensionality Reduction Monte Carlo Methods for Inferential Statistics Introduction Classical Inferential Statistics Monte Carlo Methods for Inferential Statist...

  18. Evolutionary Computation Methods and their applications in Statistics

    Directory of Open Access Journals (Sweden)

    Francesco Battaglia

    2013-05-01

    Full Text Available A brief discussion of the genesis of evolutionary computation methods, their relationship to artificial intelligence, and the contribution of genetics and Darwin’s theory of natural evolution is provided. Then, the main evolutionary computation methods are illustrated: evolution strategies, genetic algorithms, estimation of distribution algorithms, differential evolution, and a brief description of some evolutionary behavior methods such as ant colony and particle swarm optimization. We also discuss the role of the genetic algorithm for multivariate probability distribution random generation, rather than as a function optimizer. Finally, some relevant applications of genetic algorithm to statistical problems are reviewed: selection of variables in regression, time series model building, outlier identification, cluster analysis, design of experiments.

  19. A Fast and Accurate Algorithm for l1 Minimization Problems in Compressive Sampling (Preprint)

    Science.gov (United States)

    2013-01-22

    However, updating uk+1 via the formulation of Step 2 in Algorithm 1 can be implemented through the use of the component-wise Gauss - Seidel iteration which...may accelerate the rate of convergence of the algorithm and therefore reduce the total CPU-time consumed. The efficiency of component-wise Gauss - Seidel ...Micchelli, L. Shen, and Y. Xu, A proximity algorithm accelerated by Gauss - Seidel iterations for L1/TV denoising models, Inverse Problems, 28 (2012), p

  20. Partial multicanonical algorithm for molecular dynamics and Monte Carlo simulations.

    Science.gov (United States)

    Okumura, Hisashi

    2008-09-28

    Partial multicanonical algorithm is proposed for molecular dynamics and Monte Carlo simulations. The partial multicanonical simulation samples a wide range of a part of the potential-energy terms, which is necessary to sample the conformational space widely, whereas a wide range of total potential energy is sampled in the multicanonical algorithm. Thus, one can concentrate the effort to determine the weight factor only on the important energy terms in the partial multicanonical simulation. The partial multicanonical, multicanonical, and canonical molecular dynamics algorithms were applied to an alanine dipeptide in explicit water solvent. The canonical simulation sampled the states of P(II), C(5), alpha(R), and alpha(P). The multicanonical simulation covered the alpha(L) state as well as these states. The partial multicanonical simulation also sampled the C(7) (ax) state in addition to the states that were sampled by the multicanonical simulation. In the partial multicanonical simulation, furthermore, backbone dihedral angles phi and psi rotated more frequently than those in the multicanonical and canonical simulations. These results mean that the partial multicanonical algorithm has a higher sampling efficiency than the multicanonical and canonical algorithms.