WorldWideScience

Sample records for statistical sampling algorithm

  1. Economic Statistical Design of Variable Sampling Interval X¯$\\overline X $ Control Chart Based on Surrogate Variable Using Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lee Tae-Hoon

    2016-12-01

    Full Text Available In many cases, a X¯$\\overline X $ control chart based on a performance variable is used in industrial fields. Typically, the control chart monitors the measurements of a performance variable itself. However, if the performance variable is too costly or impossible to measure, and a less expensive surrogate variable is available, the process may be more efficiently controlled using surrogate variables. In this paper, we present a model for the economic statistical design of a VSI (Variable Sampling Interval X¯$\\overline X $ control chart using a surrogate variable that is linearly correlated with the performance variable. We derive the total average profit model from an economic viewpoint and apply the model to a Very High Temperature Reactor (VHTR nuclear fuel measurement system and derive the optimal result using genetic algorithms. Compared with the control chart based on a performance variable, the proposed model gives a larger expected net income per unit of time in the long-run if the correlation between the performance variable and the surrogate variable is relatively high. The proposed model was confined to the sample mean control chart under the assumption that a single assignable cause occurs according to the Poisson process. However, the model may also be extended to other types of control charts using a single or multiple assignable cause assumptions such as VSS (Variable Sample Size X¯$\\overline X $ control chart, EWMA, CUSUM charts and so on.

  2. Statistical distribution sampling

    Science.gov (United States)

    Johnson, E. S.

    1975-01-01

    Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.

  3. Algorithm for computing significance levels using the Kolmogorov-Smirnov statistic and valid for both large and small samples

    Energy Technology Data Exchange (ETDEWEB)

    Kurtz, S.E.; Fields, D.E.

    1983-10-01

    The KSTEST code presented here is designed to perform the Kolmogorov-Smirnov one-sample test. The code may be used as a stand-alone program or the principal subroutines may be excerpted and used to service other programs. The Kolmogorov-Smirnov one-sample test is a nonparametric goodness-of-fit test. A number of codes to perform this test are in existence, but they suffer from the inability to provide meaningful results in the case of small sample sizes (number of values less than or equal to 80). The KSTEST code overcomes this inadequacy by using two distinct algorithms. If the sample size is greater than 80, an asymptotic series developed by Smirnov is evaluated. If the sample size is 80 or less, a table of values generated by Birnbaum is referenced. Valid results can be obtained from KSTEST when the sample contains from 3 to 300 data points. The program was developed on a Digital Equipment Corporation PDP-10 computer using the FORTRAN-10 language. The code size is approximately 450 card images and the typical CPU execution time is 0.19 s.

  4. Contributions to sampling statistics

    CERN Document Server

    Conti, Pier; Ranalli, Maria

    2014-01-01

    This book contains a selection of the papers presented at the ITACOSM 2013 Conference, held in Milan in June 2013. ITACOSM is the bi-annual meeting of the Survey Sampling Group S2G of the Italian Statistical Society, intended as an international  forum of scientific discussion on the developments of theory and application of survey sampling methodologies and applications in human and natural sciences. The book gathers research papers carefully selected from both invited and contributed sessions of the conference. The whole book appears to be a relevant contribution to various key aspects of sampling methodology and techniques; it deals with some hot topics in sampling theory, such as calibration, quantile-regression and multiple frame surveys, and with innovative methodologies in important topics of both sampling theory and applications. Contributions cut across current sampling methodologies such as interval estimation for complex samples, randomized responses, bootstrap, weighting, modeling, imputati...

  5. Enhanced sampling algorithms.

    Science.gov (United States)

    Mitsutake, Ayori; Mori, Yoshiharu; Okamoto, Yuko

    2013-01-01

    In biomolecular systems (especially all-atom models) with many degrees of freedom such as proteins and nucleic acids, there exist an astronomically large number of local-minimum-energy states. Conventional simulations in the canonical ensemble are of little use, because they tend to get trapped in states of these energy local minima. Enhanced conformational sampling techniques are thus in great demand. A simulation in generalized ensemble performs a random walk in potential energy space and can overcome this difficulty. From only one simulation run, one can obtain canonical-ensemble averages of physical quantities as functions of temperature by the single-histogram and/or multiple-histogram reweighting techniques. In this article we review uses of the generalized-ensemble algorithms in biomolecular systems. Three well-known methods, namely, multicanonical algorithm, simulated tempering, and replica-exchange method, are described first. Both Monte Carlo and molecular dynamics versions of the algorithms are given. We then present various extensions of these three generalized-ensemble algorithms. The effectiveness of the methods is tested with short peptide and protein systems.

  6. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    Science.gov (United States)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  7. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  8. Statistical sampling strategies

    International Nuclear Information System (INIS)

    Andres, T.H.

    1987-01-01

    Systems assessment codes use mathematical models to simulate natural and engineered systems. Probabilistic systems assessment codes carry out multiple simulations to reveal the uncertainty in values of output variables due to uncertainty in the values of the model parameters. In this paper, methods are described for sampling sets of parameter values to be used in a probabilistic systems assessment code. Three Monte Carlo parameter selection methods are discussed: simple random sampling, Latin hypercube sampling, and sampling using two-level orthogonal arrays. Three post-selection transformations are also described: truncation, importance transformation, and discretization. Advantages and disadvantages of each method are summarized

  9. Ecotoxicology statistical sampling

    International Nuclear Information System (INIS)

    Saona, G.

    2012-01-01

    This presentation introduces to general concepts in toxicology sample designs such as the distribution of organic or inorganic contaminants, a microbiological contamination, and the determination of the position in an eco toxicological bioassays ecosystem.

  10. Statistical sampling plans

    International Nuclear Information System (INIS)

    Jaech, J.L.

    1984-01-01

    In auditing and in inspection, one selects a number of items by some set of procedures and performs measurements which are compared with the operator's values. This session considers the problem of how to select the samples to be measured, and what kinds of measurements to make. In the inspection situation, the ultimate aim is to independently verify the operator's material balance. The effectiveness of the sample plan in achieving this objective is briefly considered. The discussion focuses on the model plant

  11. Statistical sampling for holdup measurement

    International Nuclear Information System (INIS)

    Picard, R.R.; Pillay, K.K.S.

    1986-01-01

    Nuclear materials holdup is a serious problem in many operating facilities. Estimating amounts of holdup is important for materials accounting and, sometimes, for process safety. Clearly, measuring holdup in all pieces of equipment is not a viable option in terms of time, money, and radiation exposure to personnel. Furthermore, 100% measurement is not only impractical but unnecessary for developing estimated values. Principles of statistical sampling are valuable in the design of cost effective holdup monitoring plans and in qualifying uncertainties in holdup estimates. The purpose of this paper is to describe those principles and to illustrate their use

  12. Adaptive sampling algorithm for detection of superpoints

    Institute of Scientific and Technical Information of China (English)

    CHENG Guang; GONG Jian; DING Wei; WU Hua; QIANG ShiQiang

    2008-01-01

    The superpoints are the sources (or the destinations) that connect with a great deal of destinations (or sources) during a measurement time interval, so detecting the superpoints in real time is very important to network security and management. Previous algorithms are not able to control the usage of the memory and to deliver the desired accuracy, so it is hard to detect the superpoints on a high speed link in real time. In this paper, we propose an adaptive sampling algorithm to detect the superpoints in real time, which uses a flow sample and hold module to reduce the detection of the non-superpoints and to improve the measurement accuracy of the superpoints. We also design a data stream structure to maintain the flow records, which compensates for the flow Hash collisions statistically. An adaptive process based on different sampling probabilities is used to maintain the recorded IP ad dresses in the limited memory. This algorithm is compared with the other algo rithms by analyzing the real network trace data. Experiment results and mathematic analysis show that this algorithm has the advantages of both the limited memory requirement and high measurement accuracy.

  13. Statistical Processing Algorithms for Human Population Databases

    Directory of Open Access Journals (Sweden)

    Camelia COLESCU

    2012-01-01

    Full Text Available The article is describing some algoritms for statistic functions aplied to a human population database. The samples are specific for the most interesting periods, when the evolution of statistical datas has spectacolous value. The article describes the most usefull form of grafical prezentation of the results

  14. Sample Reuse in Statistical Remodeling.

    Science.gov (United States)

    1987-08-01

    as the jackknife and bootstrap, is an expansion of the functional, T(Fn), or of its distribution function or both. Frangos and Schucany (1987a) used...accelerated bootstrap. In the same report Frangos and Schucany demonstrated the small sample superiority of that approach over the proposals that take...higher order terms of an Edgeworth expansion into account. In a second report Frangos and Schucany (1987b) examined the small sample performance of

  15. Scalable Algorithms for Adaptive Statistical Designs

    Directory of Open Access Journals (Sweden)

    Robert Oehmke

    2000-01-01

    Full Text Available We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as generalized string matching.

  16. A sampling algorithm for segregation analysis

    Directory of Open Access Journals (Sweden)

    Henshall John

    2001-11-01

    Full Text Available Abstract Methods for detecting Quantitative Trait Loci (QTL without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated.

  17. 42 CFR 402.109 - Statistical sampling.

    Science.gov (United States)

    2010-10-01

    ... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study, if based upon an appropriate sampling and computed by valid statistical methods, constitute prima... § 402.1. (c) Burden of proof. Once CMS or OIG has made a prima facie case, the burden is on the...

  18. The Wang-Landau Sampling Algorithm

    Science.gov (United States)

    Landau, David P.

    2003-03-01

    Over the past several decades Monte Carlo simulations[1] have evolved into a powerful tool for the study of wide-ranging problems in statistical/condensed matter physics. Standard methods sample the probability distribution for the states of the system, usually in the canonical ensemble, and enormous improvements have been made in performance through the implementation of novel algorithms. Nonetheless, difficulties arise near phase transitions, either due to critical slowing down near 2nd order transitions or to metastability near 1st order transitions, thus limiting the applicability of the method. We shall describe a new and different Monte Carlo approach [2] that uses a random walk in energy space to determine the density of states directly. Once the density of states is estimated, all thermodynamic properties can be calculated at all temperatures. This approach can be extended to multi-dimensional parameter spaces and has already found use in classical models of interacting particles including systems with complex energy landscapes, e.g., spin glasses, protein folding models, etc., as well as for quantum models. 1. A Guide to Monte Carlo Simulations in Statistical Physics, D. P. Landau and K. Binder (Cambridge U. Press, Cambridge, 2000). 2. Fugao Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 (2001); Phys. Rev. E64, 056101-1 (2001).

  19. Statistical sampling approaches for soil monitoring

    NARCIS (Netherlands)

    Brus, D.J.

    2014-01-01

    This paper describes three statistical sampling approaches for regional soil monitoring, a design-based, a model-based and a hybrid approach. In the model-based approach a space-time model is exploited to predict global statistical parameters of interest such as the space-time mean. In the hybrid

  20. Statistical literacy and sample survey results

    Science.gov (United States)

    McAlevey, Lynn; Sullivan, Charles

    2010-10-01

    Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.

  1. Statistical Symbolic Execution with Informed Sampling

    Science.gov (United States)

    Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco

    2014-01-01

    Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.

  2. Statistical aspects of food safety sampling

    NARCIS (Netherlands)

    Jongenburger, I.; Besten, den H.M.W.; Zwietering, M.H.

    2015-01-01

    In food safety management, sampling is an important tool for verifying control. Sampling by nature is a stochastic process. However, uncertainty regarding results is made even greater by the uneven distribution of microorganisms in a batch of food. This article reviews statistical aspects of

  3. Monte Carlo molecular simulations: improving the statistical efficiency of samples with the help of artificial evolution algorithms; Simulations moleculaires de Monte Carlo: amelioration de l'efficacite statistique de l'echantillonnage grace aux algorithmes d'evolution artificielle

    Energy Technology Data Exchange (ETDEWEB)

    Leblanc, B.

    2002-03-01

    Molecular simulation aims at simulating particles in interaction, describing a physico-chemical system. When considering Markov Chain Monte Carlo sampling in this context, we often meet the same problem of statistical efficiency as with Molecular Dynamics for the simulation of complex molecules (polymers for example). The search for a correct sampling of the space of possible configurations with respect to the Boltzmann-Gibbs distribution is directly related to the statistical efficiency of such algorithms (i.e. the ability of rapidly providing uncorrelated states covering all the configuration space). We investigated how to improve this efficiency with the help of Artificial Evolution (AE). AE algorithms form a class of stochastic optimization algorithms inspired by Darwinian evolution. Efficiency measures that can be turned into efficiency criteria have been first searched before identifying parameters that could be optimized. Relative frequencies for each type of Monte Carlo moves, usually empirically chosen in reasonable ranges, were first considered. We combined parallel simulations with a 'genetic server' in order to dynamically improve the quality of the sampling during the simulations progress. Our results shows that in comparison with some reference settings, it is possible to improve the quality of samples with respect to the chosen criterion. The same algorithm has been applied to improve the Parallel Tempering technique, in order to optimize in the same time the relative frequencies of Monte Carlo moves and the relative frequencies of swapping between sub-systems simulated at different temperatures. Finally, hints for further research in order to optimize the choice of additional temperatures are given. (author)

  4. Statistical benchmark for BosonSampling

    International Nuclear Information System (INIS)

    Walschaers, Mattia; Mayer, Klaus; Buchleitner, Andreas; Kuipers, Jack; Urbina, Juan-Diego; Richter, Klaus; Tichy, Malte Christopher

    2016-01-01

    Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church–Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects. (fast track communication)

  5. Statistical sampling method for releasing decontaminated vehicles

    International Nuclear Information System (INIS)

    Lively, J.W.; Ware, J.A.

    1996-01-01

    Earth moving vehicles (e.g., dump trucks, belly dumps) commonly haul radiologically contaminated materials from a site being remediated to a disposal site. Traditionally, each vehicle must be surveyed before being released. The logistical difficulties of implementing the traditional approach on a large scale demand that an alternative be devised. A statistical method (MIL-STD-105E, open-quotes Sampling Procedures and Tables for Inspection by Attributesclose quotes) for assessing product quality from a continuous process was adapted to the vehicle decontamination process. This method produced a sampling scheme that automatically compensates and accommodates fluctuating batch sizes and changing conditions without the need to modify or rectify the sampling scheme in the field. Vehicles are randomly selected (sampled) upon completion of the decontamination process to be surveyed for residual radioactive surface contamination. The frequency of sampling is based on the expected number of vehicles passing through the decontamination process in a given period and the confidence level desired. This process has been successfully used for 1 year at the former uranium mill site in Monticello, Utah (a CERCLA regulated clean-up site). The method forces improvement in the quality of the decontamination process and results in a lower likelihood that vehicles exceeding the surface contamination standards are offered for survey. Implementation of this statistical sampling method on Monticello Projects has resulted in more efficient processing of vehicles through decontamination and radiological release, saved hundreds of hours of processing time, provided a high level of confidence that release limits are met, and improved the radiological cleanliness of vehicles leaving the controlled site

  6. Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

    OpenAIRE

    Gray, Alexander G.; Fischer, Bernd; Schumann, Johann; Buntine, Wray

    2003-01-01

    Machine learning has reached a point where many probabilistic methods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms customized for different models. Here, we describe the AUTOBAYES system which takes a high-level statistical model specification, uses powerful symbolic techniques based on schema-based program synthesis and computer alge...

  7. Statistical Assessment of Gene Fusion Detection Algorithms using RNASequencing Data

    NARCIS (Netherlands)

    Varadan, V.; Janevski, A.; Kamalakaran, S.; Banerjee, N.; Harris, L.; Dimitrova, D.

    2012-01-01

    The detection and quantification of fusion transcripts has both biological and clinical implications. RNA sequencing technology provides a means for unbiased and high resolution characterization of fusion transcript information in tissue samples. We evaluated two fusiondetection algorithms,

  8. Statistical classification techniques in high energy physics (SDDT algorithm)

    International Nuclear Information System (INIS)

    Bouř, Petr; Kůs, Václav; Franc, Jiří

    2016-01-01

    We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp̅ collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb"-"1 recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests. (paper)

  9. Computationally efficient algorithms for statistical image processing : implementation in R

    NARCIS (Netherlands)

    Langovoy, M.; Wittich, O.

    2010-01-01

    In the series of our earlier papers on the subject, we proposed a novel statistical hypothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We developed algorithms that allowed to detect objects of unknown shapes in

  10. Statistics and sampling in transuranic studies

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Gilbert, R.O.

    1980-01-01

    The existing data on transuranics in the environment exhibit a remarkably high variability from sample to sample (coefficients of variation of 100% or greater). This chapter stresses the necessity of adequate sample size and suggests various ways to increase sampling efficiency. Objectives in sampling are regarded as being of great importance in making decisions as to sampling methodology. Four different classes of sampling methods are described: (1) descriptive sampling, (2) sampling for spatial pattern, (3) analytical sampling, and (4) sampling for modeling. A number of research needs are identified in the various sampling categories along with several problems that appear to be common to two or more such areas

  11. A Simplified Algorithm for Statistical Investigation of Damage Spreading

    International Nuclear Information System (INIS)

    Gecow, Andrzej

    2009-01-01

    On the way to simulating adaptive evolution of complex system describing a living object or human developed project, a fitness should be defined on node states or network external outputs. Feedbacks lead to circular attractors of these states or outputs which make it difficult to define a fitness. The main statistical effects of adaptive condition are the result of small change tendency and to appear, they only need a statistically correct size of damage initiated by evolutionary change of system. This observation allows to cut loops of feedbacks and in effect to obtain a particular statistically correct state instead of a long circular attractor which in the quenched model is expected for chaotic network with feedback. Defining fitness on such states is simple. We calculate only damaged nodes and only once. Such an algorithm is optimal for investigation of damage spreading i.e. statistical connections of structural parameters of initial change with the size of effected damage. It is a reversed-annealed method--function and states (signals) may be randomly substituted but connections are important and are preserved. The small damages important for adaptive evolution are correctly depicted in comparison to Derrida annealed approximation which expects equilibrium levels for large networks. The algorithm indicates these levels correctly. The relevant program in Pascal, which executes the algorithm for a wide range of parameters, can be obtained from the author.

  12. A fast direct sampling algorithm for equilateral closed polygons

    International Nuclear Information System (INIS)

    Cantarella, Jason; Duplantier, Bertrand; Shonkwiler, Clayton; Uehara, Erica

    2016-01-01

    Sampling equilateral closed polygons is of interest in the statistical study of ring polymers. Over the past 30 years, previous authors have proposed a variety of simple Markov chain algorithms (but have not been able to show that they converge to the correct probability distribution) and complicated direct samplers (which require extended-precision arithmetic to evaluate numerically unstable polynomials). We present a simple direct sampler which is fast and numerically stable, and analyze its runtime using a new formula for the volume of equilateral polygon space as a Dirichlet-type integral. (paper)

  13. Stochastic geometry, spatial statistics and random fields models and algorithms

    CERN Document Server

    2015-01-01

    Providing a graduate level introduction to various aspects of stochastic geometry, spatial statistics and random fields, this volume places a special emphasis on fundamental classes of models and algorithms as well as on their applications, for example in materials science, biology and genetics. This book has a strong focus on simulations and includes extensive codes in Matlab and R, which are widely used in the mathematical community. It can be regarded as a continuation of the recent volume 2068 of Lecture Notes in Mathematics, where other issues of stochastic geometry, spatial statistics and random fields were considered, with a focus on asymptotic methods.

  14. Implementation and statistical analysis of Metropolis algorithm for SU(3)

    International Nuclear Information System (INIS)

    Katznelson, E.; Nobile, A.

    1984-12-01

    In this paper we study the statistical properties of an implementation of the Metropolis algorithm for SU(3) gauge theory. It is shown that the results have normal distribution. We demonstrate that in this case error analysis can be carried on in a simple way and we show that applying it to both the measurement strategy and the output data analysis has an important influence on the performance and reliability of the simulation. (author)

  15. Predicting Smoking Status Using Machine Learning Algorithms and Statistical Analysis

    Directory of Open Access Journals (Sweden)

    Charles Frank

    2018-03-01

    Full Text Available Smoking has been proven to negatively affect health in a multitude of ways. As of 2009, smoking has been considered the leading cause of preventable morbidity and mortality in the United States, continuing to plague the country’s overall health. This study aims to investigate the viability and effectiveness of some machine learning algorithms for predicting the smoking status of patients based on their blood tests and vital readings results. The analysis of this study is divided into two parts: In part 1, we use One-way ANOVA analysis with SAS tool to show the statistically significant difference in blood test readings between smokers and non-smokers. The results show that the difference in INR, which measures the effectiveness of anticoagulants, was significant in favor of non-smokers which further confirms the health risks associated with smoking. In part 2, we use five machine learning algorithms: Naïve Bayes, MLP, Logistic regression classifier, J48 and Decision Table to predict the smoking status of patients. To compare the effectiveness of these algorithms we use: Precision, Recall, F-measure and Accuracy measures. The results show that the Logistic algorithm outperformed the four other algorithms with Precision, Recall, F-Measure, and Accuracy of 83%, 83.4%, 83.2%, 83.44%, respectively.

  16. Efficient sampling algorithms for Monte Carlo based treatment planning

    International Nuclear Information System (INIS)

    DeMarco, J.J.; Solberg, T.D.; Chetty, I.; Smathers, J.B.

    1998-01-01

    Efficient sampling algorithms are necessary for producing a fast Monte Carlo based treatment planning code. This study evaluates several aspects of a photon-based tracking scheme and the effect of optimal sampling algorithms on the efficiency of the code. Four areas were tested: pseudo-random number generation, generalized sampling of a discrete distribution, sampling from the exponential distribution, and delta scattering as applied to photon transport through a heterogeneous simulation geometry. Generalized sampling of a discrete distribution using the cutpoint method can produce speedup gains of one order of magnitude versus conventional sequential sampling. Photon transport modifications based upon the delta scattering method were implemented and compared with a conventional boundary and collision checking algorithm. The delta scattering algorithm is faster by a factor of six versus the conventional algorithm for a boundary size of 5 mm within a heterogeneous geometry. A comparison of portable pseudo-random number algorithms and exponential sampling techniques is also discussed

  17. Statistical behaviour of adaptive multilevel splitting algorithms in simple models

    International Nuclear Information System (INIS)

    Rolland, Joran; Simonnet, Eric

    2015-01-01

    Adaptive multilevel splitting algorithms have been introduced rather recently for estimating tail distributions in a fast and efficient way. In particular, they can be used for computing the so-called reactive trajectories corresponding to direct transitions from one metastable state to another. The algorithm is based on successive selection–mutation steps performed on the system in a controlled way. It has two intrinsic parameters, the number of particles/trajectories and the reaction coordinate used for discriminating good or bad trajectories. We investigate first the convergence in law of the algorithm as a function of the timestep for several simple stochastic models. Second, we consider the average duration of reactive trajectories for which no theoretical predictions exist. The most important aspect of this work concerns some systems with two degrees of freedom. They are studied in detail as a function of the reaction coordinate in the asymptotic regime where the number of trajectories goes to infinity. We show that during phase transitions, the statistics of the algorithm deviate significatively from known theoretical results when using non-optimal reaction coordinates. In this case, the variance of the algorithm is peaking at the transition and the convergence of the algorithm can be much slower than the usual expected central limit behaviour. The duration of trajectories is affected as well. Moreover, reactive trajectories do not correspond to the most probable ones. Such behaviour disappears when using the optimal reaction coordinate called committor as predicted by the theory. We finally investigate a three-state Markov chain which reproduces this phenomenon and show logarithmic convergence of the trajectory durations

  18. Statistical sampling methods for soils monitoring

    Science.gov (United States)

    Ann M. Abbott

    2010-01-01

    Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...

  19. Performance comparison between total variation (TV)-based compressed sensing and statistical iterative reconstruction algorithms

    International Nuclear Information System (INIS)

    Tang Jie; Nett, Brian E; Chen Guanghong

    2009-01-01

    Of all available reconstruction methods, statistical iterative reconstruction algorithms appear particularly promising since they enable accurate physical noise modeling. The newly developed compressive sampling/compressed sensing (CS) algorithm has shown the potential to accurately reconstruct images from highly undersampled data. The CS algorithm can be implemented in the statistical reconstruction framework as well. In this study, we compared the performance of two standard statistical reconstruction algorithms (penalized weighted least squares and q-GGMRF) to the CS algorithm. In assessing the image quality using these iterative reconstructions, it is critical to utilize realistic background anatomy as the reconstruction results are object dependent. A cadaver head was scanned on a Varian Trilogy system at different dose levels. Several figures of merit including the relative root mean square error and a quality factor which accounts for the noise performance and the spatial resolution were introduced to objectively evaluate reconstruction performance. A comparison is presented between the three algorithms for a constant undersampling factor comparing different algorithms at several dose levels. To facilitate this comparison, the original CS method was formulated in the framework of the statistical image reconstruction algorithms. Important conclusions of the measurements from our studies are that (1) for realistic neuro-anatomy, over 100 projections are required to avoid streak artifacts in the reconstructed images even with CS reconstruction, (2) regardless of the algorithm employed, it is beneficial to distribute the total dose to more views as long as each view remains quantum noise limited and (3) the total variation-based CS method is not appropriate for very low dose levels because while it can mitigate streaking artifacts, the images exhibit patchy behavior, which is potentially harmful for medical diagnosis.

  20. Development of modelling algorithm of technological systems by statistical tests

    Science.gov (United States)

    Shemshura, E. A.; Otrokov, A. V.; Chernyh, V. G.

    2018-03-01

    The paper tackles the problem of economic assessment of design efficiency regarding various technological systems at the stage of their operation. The modelling algorithm of a technological system was performed using statistical tests and with account of the reliability index allows estimating the level of machinery technical excellence and defining the efficiency of design reliability against its performance. Economic feasibility of its application shall be determined on the basis of service quality of a technological system with further forecasting of volumes and the range of spare parts supply.

  1. Sampling, Probability Models and Statistical Reasoning -RE ...

    Indian Academy of Sciences (India)

    random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.

  2. Two General Extension Algorithms of Latin Hypercube Sampling

    Directory of Open Access Journals (Sweden)

    Zhi-zhao Liu

    2015-01-01

    Full Text Available For reserving original sampling points to reduce the simulation runs, two general extension algorithms of Latin Hypercube Sampling (LHS are proposed. The extension algorithms start with an original LHS of size m and construct a new LHS of size m+n that contains the original points as many as possible. In order to get a strict LHS of larger size, some original points might be deleted. The relationship of original sampling points in the new LHS structure is shown by a simple undirected acyclic graph. The basic general extension algorithm is proposed to reserve the most original points, but it costs too much time. Therefore, a general extension algorithm based on greedy algorithm is proposed to reduce the extension time, which cannot guarantee to contain the most original points. These algorithms are illustrated by an example and applied to evaluating the sample means to demonstrate the effectiveness.

  3. Iterative importance sampling algorithms for parameter estimation

    OpenAIRE

    Morzfeld, Matthias; Day, Marcus S.; Grout, Ray W.; Pau, George Shu Heng; Finsterle, Stefan A.; Bell, John B.

    2016-01-01

    In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov Chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is ...

  4. Pierre Gy's sampling theory and sampling practice heterogeneity, sampling correctness, and statistical process control

    CERN Document Server

    Pitard, Francis F

    1993-01-01

    Pierre Gy's Sampling Theory and Sampling Practice, Second Edition is a concise, step-by-step guide for process variability management and methods. Updated and expanded, this new edition provides a comprehensive study of heterogeneity, covering the basic principles of sampling theory and its various applications. It presents many practical examples to allow readers to select appropriate sampling protocols and assess the validity of sampling protocols from others. The variability of dynamic process streams using variography is discussed to help bridge sampling theory with statistical process control. Many descriptions of good sampling devices, as well as descriptions of poor ones, are featured to educate readers on what to look for when purchasing sampling systems. The book uses its accessible, tutorial style to focus on professional selection and use of methods. The book will be a valuable guide for mineral processing engineers; metallurgists; geologists; miners; chemists; environmental scientists; and practit...

  5. Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes

    Science.gov (United States)

    Williams Colin P.

    1999-01-01

    Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.

  6. The variance quadtree algorithm: use for spatial sampling design

    NARCIS (Netherlands)

    Minasny, B.; McBratney, A.B.; Walvoort, D.J.J.

    2007-01-01

    Spatial sampling schemes are mainly developed to determine sampling locations that can cover the variation of environmental properties in the area of interest. Here we proposed the variance quadtree algorithm for sampling in an area with prior information represented as ancillary or secondary

  7. A statistical mechanical interpretation of algorithmic information theory: Total statistical mechanical interpretation based on physical argument

    International Nuclear Information System (INIS)

    Tadaki, Kohtaro

    2010-01-01

    The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T is an element of (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.

  8. Measuring radioactive half-lives via statistical sampling in practice

    Science.gov (United States)

    Lorusso, G.; Collins, S. M.; Jagan, K.; Hitt, G. W.; Sadek, A. M.; Aitken-Smith, P. M.; Bridi, D.; Keightley, J. D.

    2017-10-01

    The statistical sampling method for the measurement of radioactive decay half-lives exhibits intriguing features such as that the half-life is approximately the median of a distribution closely resembling a Cauchy distribution. Whilst initial theoretical considerations suggested that in certain cases the method could have significant advantages, accurate measurements by statistical sampling have proven difficult, for they require an exercise in non-standard statistical analysis. As a consequence, no half-life measurement using this method has yet been reported and no comparison with traditional methods has ever been made. We used a Monte Carlo approach to address these analysis difficulties, and present the first experimental measurement of a radioisotope half-life (211Pb) by statistical sampling in good agreement with the literature recommended value. Our work also focused on the comparison between statistical sampling and exponential regression analysis, and concluded that exponential regression achieves generally the highest accuracy.

  9. Compressively sampled MR image reconstruction using generalized thresholding iterative algorithm

    Science.gov (United States)

    Elahi, Sana; kaleem, Muhammad; Omer, Hammad

    2018-01-01

    Compressed sensing (CS) is an emerging area of interest in Magnetic Resonance Imaging (MRI). CS is used for the reconstruction of the images from a very limited number of samples in k-space. This significantly reduces the MRI data acquisition time. One important requirement for signal recovery in CS is the use of an appropriate non-linear reconstruction algorithm. It is a challenging task to choose a reconstruction algorithm that would accurately reconstruct the MR images from the under-sampled k-space data. Various algorithms have been used to solve the system of non-linear equations for better image quality and reconstruction speed in CS. In the recent past, iterative soft thresholding algorithm (ISTA) has been introduced in CS-MRI. This algorithm directly cancels the incoherent artifacts produced because of the undersampling in k -space. This paper introduces an improved iterative algorithm based on p -thresholding technique for CS-MRI image reconstruction. The use of p -thresholding function promotes sparsity in the image which is a key factor for CS based image reconstruction. The p -thresholding based iterative algorithm is a modification of ISTA, and minimizes non-convex functions. It has been shown that the proposed p -thresholding iterative algorithm can be used effectively to recover fully sampled image from the under-sampled data in MRI. The performance of the proposed method is verified using simulated and actual MRI data taken at St. Mary's Hospital, London. The quality of the reconstructed images is measured in terms of peak signal-to-noise ratio (PSNR), artifact power (AP), and structural similarity index measure (SSIM). The proposed approach shows improved performance when compared to other iterative algorithms based on log thresholding, soft thresholding and hard thresholding techniques at different reduction factors.

  10. A course in mathematical statistics and large sample theory

    CERN Document Server

    Bhattacharya, Rabi; Patrangenaru, Victor

    2016-01-01

    This graduate-level textbook is primarily aimed at graduate students of statistics, mathematics, science, and engineering who have had an undergraduate course in statistics, an upper division course in analysis, and some acquaintance with measure theoretic probability. It provides a rigorous presentation of the core of mathematical statistics. Part I of this book constitutes a one-semester course on basic parametric mathematical statistics. Part II deals with the large sample theory of statistics — parametric and nonparametric, and its contents may be covered in one semester as well. Part III provides brief accounts of a number of topics of current interest for practitioners and other disciplines whose work involves statistical methods. Large Sample theory with many worked examples, numerical calculations, and simulations to illustrate theory Appendices provide ready access to a number of standard results, with many proofs Solutions given to a number of selected exercises from Part I Part II exercises with ...

  11. STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.

    2010-09-02

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  12. Statistical Analysis Of Tank 19F Floor Sample Results

    International Nuclear Information System (INIS)

    Harris, S.

    2010-01-01

    Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples results to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).

  13. A software sampling frequency adaptive algorithm for reducing spectral leakage

    Institute of Scientific and Technical Information of China (English)

    PAN Li-dong; WANG Fei

    2006-01-01

    Spectral leakage caused by synchronous error in a nonsynchronous sampling system is an important cause that reduces the accuracy of spectral analysis and harmonic measurement.This paper presents a software sampling frequency adaptive algorithm that can obtain the actual signal frequency more accurately,and then adjusts sampling interval base on the frequency calculated by software algorithm and modifies sampling frequency adaptively.It can reduce synchronous error and impact of spectral leakage;thereby improving the accuracy of spectral analysis and harmonic measurement for power system signal where frequency changes slowly.This algorithm has high precision just like the simulations show,and it can be a practical method in power system harmonic analysis since it can be implemented easily.

  14. The Role of the Sampling Distribution in Understanding Statistical Inference

    Science.gov (United States)

    Lipson, Kay

    2003-01-01

    Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…

  15. Illustrating Sampling Distribution of a Statistic: Minitab Revisited

    Science.gov (United States)

    Johnson, H. Dean; Evans, Marc A.

    2008-01-01

    Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…

  16. An Improved Nested Sampling Algorithm for Model Selection and Assessment

    Science.gov (United States)

    Zeng, X.; Ye, M.; Wu, J.; WANG, D.

    2017-12-01

    Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.

  17. Statistical sampling techniques as applied to OSE inspections

    International Nuclear Information System (INIS)

    Davis, J.J.; Cote, R.W.

    1987-01-01

    The need has been recognized for statistically valid methods for gathering information during OSE inspections; and for interpretation of results, both from performance testing and from records reviews, interviews, etc. Battelle Columbus Division, under contract to DOE OSE has performed and is continuing to perform work in the area of statistical methodology for OSE inspections. This paper represents some of the sampling methodology currently being developed for use during OSE inspections. Topics include population definition, sample size requirements, level of confidence and practical logistical constraints associated with the conduct of an inspection based on random sampling. Sequential sampling schemes and sampling from finite populations are also discussed. The methods described are applicable to various data gathering activities, ranging from the sampling and examination of classified documents to the sampling of Protective Force security inspectors for skill testing

  18. The product composition control system at Savannah River: Statistical process control algorithm

    International Nuclear Information System (INIS)

    Brown, K.G.

    1994-01-01

    The Defense Waste Processing Facility (DWPF) at the Savannah River Site (SRS) will be used to immobilize the approximately 130 million liters of high-level nuclear waste currently stored at the site in 51 carbon steel tanks. Waste handling operations separate this waste into highly radioactive insoluble sludge and precipitate and less radioactive water soluble salts. In DWPF, precipitate (PHA) is blended with insoluble sludge and ground glass frit to produce melter feed slurry which is continuously fed to the DWPF melter. The melter produces a molten borosilicate glass which is poured into stainless steel canisters for cooling and, ultimately, shipment to and storage in an geologic repository. Described here is the Product Composition Control System (PCCS) process control algorithm. The PCCS is the amalgam of computer hardware and software intended to ensure that the melt will be processable and that the glass wasteform produced will be acceptable. Within PCCS, the Statistical Process Control (SPC) Algorithm is the means which guides control of the DWPF process. The SPC Algorithm is necessary to control the multivariate DWPF process in the face of uncertainties arising from the process, its feeds, sampling, modeling, and measurement systems. This article describes the functions performed by the SPC Algorithm, characterization of DWPF prior to making product, accounting for prediction uncertainty, accounting for measurement uncertainty, monitoring a SME batch, incorporating process information, and advantages of the algorithm. 9 refs., 6 figs

  19. Fast sampling algorithm for the simulation of photon Compton scattering

    International Nuclear Information System (INIS)

    Brusa, D.; Salvat, F.

    1996-01-01

    A simple algorithm for the simulation of Compton interactions of unpolarized photons is described. The energy and direction of the scattered photon, as well as the active atomic electron shell, are sampled from the double-differential cross section obtained by Ribberfors from the relativistic impulse approximation. The algorithm consistently accounts for Doppler broadening and electron binding effects. Simplifications of Ribberfors' formula, required for efficient random sampling, are discussed. The algorithm involves a combination of inverse transform, composition and rejection methods. A parameterization of the Compton profile is proposed from which the simulation of Compton events can be performed analytically in terms of a few parameters that characterize the target atom, namely shell ionization energies, occupation numbers and maximum values of the one-electron Compton profiles. (orig.)

  20. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using

  1. A Monte Carlo Metropolis-Hastings Algorithm for Sampling from Distributions with Intractable Normalizing Constants

    KAUST Repository

    Liang, Faming; Jin, Ick-Hoon

    2013-01-01

    Simulating from distributions with intractable normalizing constants has been a long-standing problem inmachine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. TheMCMHalgorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals. © 2013 Massachusetts Institute of Technology.

  2. A Monte Carlo Metropolis-Hastings Algorithm for Sampling from Distributions with Intractable Normalizing Constants

    KAUST Repository

    Liang, Faming

    2013-08-01

    Simulating from distributions with intractable normalizing constants has been a long-standing problem inmachine learning. In this letter, we propose a new algorithm, the Monte Carlo Metropolis-Hastings (MCMH) algorithm, for tackling this problem. The MCMH algorithm is a Monte Carlo version of the Metropolis-Hastings algorithm. It replaces the unknown normalizing constant ratio by a Monte Carlo estimate in simulations, while still converges, as shown in the letter, to the desired target distribution under mild conditions. The MCMH algorithm is illustrated with spatial autologistic models and exponential random graph models. Unlike other auxiliary variable Markov chain Monte Carlo (MCMC) algorithms, such as the Møller and exchange algorithms, the MCMH algorithm avoids the requirement for perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive. TheMCMHalgorithm can also be applied to Bayesian inference for random effect models and missing data problems that involve simulations from a distribution with intractable integrals. © 2013 Massachusetts Institute of Technology.

  3. Characteristic statistic algorithm (CSA) for in-core loading pattern optimization

    International Nuclear Information System (INIS)

    Liu Zhihong; Hu Yongming; Shi Gong

    2007-01-01

    To solve the problem of PWR in-core loading pattern optimization, a more suitable global optimization algorithm, i.e., Characteristic statistic algorithm (CSA), is used. The searching process of this algorithm and how to apply it to this problem are presented. Loading pattern optimization code SCYCLE is developed. Two different problems on real PWR models are calculated and the results are compared with other algorithms. It is shown that SCYCLE has high efficiency and good global performance on this problem. (authors)

  4. Algorithm for the generation of nuclear spin species and nuclear spin statistical weights

    International Nuclear Information System (INIS)

    Balasubramanian, K.

    1982-01-01

    A set of algorithms for the computer generation of nuclear spin species and nuclear spin statistical weights potentially useful in molecular spectroscopy is developed. These algorithms generate the nuclear spin species from group structures known as generalized character cycle indices (GCCIs). Thus the required input for these algorithms is just the set of all GCCIs for the symmetry group of the molecule which can be computed easily from the character table. The algorithms are executed and illustrated with examples

  5. Statistical sampling and modelling for cork oak and eucalyptus stands

    NARCIS (Netherlands)

    Paulo, M.J.

    2002-01-01

    This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication

  6. Comparison of single distance phase retrieval algorithms by considering different object composition and the effect of statistical and structural noise.

    Science.gov (United States)

    Chen, R C; Rigon, L; Longo, R

    2013-03-25

    Phase retrieval is a technique for extracting quantitative phase information from X-ray propagation-based phase-contrast tomography (PPCT). In this paper, the performance of different single distance phase retrieval algorithms will be investigated. The algorithms are herein called phase-attenuation duality Born Algorithm (PAD-BA), phase-attenuation duality Rytov Algorithm (PAD-RA), phase-attenuation duality Modified Bronnikov Algorithm (PAD-MBA), phase-attenuation duality Paganin algorithm (PAD-PA) and phase-attenuation duality Wu Algorithm (PAD-WA), respectively. They are all based on phase-attenuation duality property and on weak absorption of the sample and they employ only a single distance PPCT data. In this paper, they are investigated via simulated noise-free PPCT data considering the fulfillment of PAD property and weakly absorbing conditions, and with experimental PPCT data of a mixture sample containing absorbing and weakly absorbing materials, and of a polymer sample considering different degrees of statistical and structural noise. The simulation shows all algorithms can quantitatively reconstruct the 3D refractive index of a quasi-homogeneous weakly absorbing object from noise-free PPCT data. When the weakly absorbing condition is violated, the PAD-RA and PAD-PA/WA obtain better result than PAD-BA and PAD-MBA that are shown in both simulation and mixture sample results. When considering the statistical noise, the contrast-to-noise ratio values decreases as the photon number is reduced. The structural noise study shows that the result is progressively corrupted by ring-like artifacts with the increase of structural noise (i.e. phantom thickness). The PAD-RA and PAD-PA/WA gain better density resolution than the PAD-BA and PAD-MBA in both statistical and structural noise study.

  7. Statistical Algorithm for the Adaptation of Detection Thresholds

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2008-01-01

    Many event detection mechanisms in spark ignition automotive engines are based on the comparison of the engine signals to the detection threshold values. Different signal qualities for new and aged engines necessitate the development of an adaptation algorithm for the detection thresholds...... remains constant regardless of engine age and changing detection threshold values. This, in turn, guarantees the same event detection performance for new and aged engines/sensors. Adaptation of the engine knock detection threshold is given as an example. Udgivelsesdato: 2008...

  8. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  9. Statistical conditional sampling for variable-resolution video compression.

    Directory of Open Access Journals (Sweden)

    Alexander Wong

    Full Text Available In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot are stored and compressed at a reduced resolution. At the decompression stage, a region-based dictionary is constructed from the key-frames and used to restore the reduced resolution frames to the original resolution via statistical conditional sampling. The sampling approach is based on the conditional probability of the CRF modeling by use of the constructed dictionary. Experimental results show that the proposed variable-resolution approach via statistical conditional sampling has potential for improving compression rates when compared to compressing the video at full resolution, while achieving higher video quality when compared to compressing the video at reduced resolution.

  10. Effective traffic features selection algorithm for cyber-attacks samples

    Science.gov (United States)

    Li, Yihong; Liu, Fangzheng; Du, Zhenyu

    2018-05-01

    By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.

  11. Classical boson sampling algorithms with superior performance to near-term experiments

    Science.gov (United States)

    Neville, Alex; Sparrow, Chris; Clifford, Raphaël; Johnston, Eric; Birchall, Patrick M.; Montanaro, Ashley; Laing, Anthony

    2017-12-01

    It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.

  12. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem

  13. Statistical algorithm for automated signature analysis of power spectral density data

    International Nuclear Information System (INIS)

    Piety, K.R.

    1977-01-01

    A statistical algorithm has been developed and implemented on a minicomputer system for on-line, surveillance applications. Power spectral density (PSD) measurements on process signals are the performance signatures that characterize the ''health'' of the monitored equipment. Statistical methods provide a quantitative basis for automating the detection of anomalous conditions. The surveillance algorithm has been tested on signals from neutron sensors, proximeter probes, and accelerometers to determine its potential for monitoring nuclear reactors and rotating machinery

  14. Statistical Methods and Tools for Hanford Staged Feed Tank Sampling

    Energy Technology Data Exchange (ETDEWEB)

    Fountain, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Brigantic, Robert T. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Peterson, Reid A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2013-10-01

    This report summarizes work conducted by Pacific Northwest National Laboratory to technically evaluate the current approach to staged feed sampling of high-level waste (HLW) sludge to meet waste acceptance criteria (WAC) for transfer from tank farms to the Hanford Waste Treatment and Immobilization Plant (WTP). The current sampling and analysis approach is detailed in the document titled Initial Data Quality Objectives for WTP Feed Acceptance Criteria, 24590-WTP-RPT-MGT-11-014, Revision 0 (Arakali et al. 2011). The goal of this current work is to evaluate and provide recommendations to support a defensible, technical and statistical basis for the staged feed sampling approach that meets WAC data quality objectives (DQOs).

  15. Gene coexpression measures in large heterogeneous samples using count statistics.

    Science.gov (United States)

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  16. Weighted statistical parameters for irregularly sampled time series

    Science.gov (United States)

    Rimoldini, Lorenzo

    2014-01-01

    Unevenly spaced time series are common in astronomy because of the day-night cycle, weather conditions, dependence on the source position in the sky, allocated telescope time and corrupt measurements, for example, or inherent to the scanning law of satellites like Hipparcos and the forthcoming Gaia. Irregular sampling often causes clumps of measurements and gaps with no data which can severely disrupt the values of estimators. This paper aims at improving the accuracy of common statistical parameters when linear interpolation (in time or phase) can be considered an acceptable approximation of a deterministic signal. A pragmatic solution is formulated in terms of a simple weighting scheme, adapting to the sampling density and noise level, applicable to large data volumes at minimal computational cost. Tests on time series from the Hipparcos periodic catalogue led to significant improvements in the overall accuracy and precision of the estimators with respect to the unweighted counterparts and those weighted by inverse-squared uncertainties. Automated classification procedures employing statistical parameters weighted by the suggested scheme confirmed the benefits of the improved input attributes. The classification of eclipsing binaries, Mira, RR Lyrae, Delta Cephei and Alpha2 Canum Venaticorum stars employing exclusively weighted descriptive statistics achieved an overall accuracy of 92 per cent, about 6 per cent higher than with unweighted estimators.

  17. Statistical sampling applied to the radiological characterization of historical waste

    Directory of Open Access Journals (Sweden)

    Zaffora Biagio

    2016-01-01

    Full Text Available The evaluation of the activity of radionuclides in radioactive waste is required for its disposal in final repositories. Easy-to-measure nuclides, like γ-emitters and high-energy X-rays, can be measured via non-destructive nuclear techniques from outside a waste package. Some radionuclides are difficult-to-measure (DTM from outside a package because they are α- or β-emitters. The present article discusses the application of linear regression, scaling factors (SF and the so-called “mean activity method” to estimate the activity of DTM nuclides on metallic waste produced at the European Organization for Nuclear Research (CERN. Various statistical sampling techniques including simple random sampling, systematic sampling, stratified and authoritative sampling are described and applied to 2 waste populations of activated copper cables. The bootstrap is introduced as a tool to estimate average activities and standard errors in waste characterization. The analysis of the DTM Ni-63 is used as an example. Experimental and theoretical values of SFs are calculated and compared. Guidelines for sampling historical waste using probabilistic and non-probabilistic sampling are finally given.

  18. Algorithm for image retrieval based on edge gradient orientation statistical code.

    Science.gov (United States)

    Zeng, Jiexian; Zhao, Yonggang; Li, Weiye; Fu, Xiang

    2014-01-01

    Image edge gradient direction not only contains important information of the shape, but also has a simple, lower complexity characteristic. Considering that the edge gradient direction histograms and edge direction autocorrelogram do not have the rotation invariance, we put forward the image retrieval algorithm which is based on edge gradient orientation statistical code (hereinafter referred to as EGOSC) by sharing the application of the statistics method in the edge direction of the chain code in eight neighborhoods to the statistics of the edge gradient direction. Firstly, we construct the n-direction vector and make maximal summation restriction on EGOSC to make sure this algorithm is invariable for rotation effectively. Then, we use Euclidean distance of edge gradient direction entropy to measure shape similarity, so that this method is not sensitive to scaling, color, and illumination change. The experimental results and the algorithm analysis demonstrate that the algorithm can be used for content-based image retrieval and has good retrieval results.

  19. A Formal Approach for RT-DVS Algorithms Evaluation Based on Statistical Model Checking

    Directory of Open Access Journals (Sweden)

    Shengxin Dai

    2015-01-01

    Full Text Available Energy saving is a crucial concern in embedded real time systems. Many RT-DVS algorithms have been proposed to save energy while preserving deadline guarantees. This paper presents a novel approach to evaluate RT-DVS algorithms using statistical model checking. A scalable framework is proposed for RT-DVS algorithms evaluation, in which the relevant components are modeled as stochastic timed automata, and the evaluation metrics including utilization bound, energy efficiency, battery awareness, and temperature awareness are expressed as statistical queries. Evaluation of these metrics is performed by verifying the corresponding queries using UPPAAL-SMC and analyzing the statistical information provided by the tool. We demonstrate the applicability of our framework via a case study of five classical RT-DVS algorithms.

  20. Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

    Science.gov (United States)

    Raymond L. Czaplewski

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

  1. Development of algorithms for building inventory compilation through remote sensing and statistical inferencing

    Science.gov (United States)

    Sarabandi, Pooya

    economical way. A terrain-dependent-search algorithm is formulated to facilitate the search for correspondences in a quasi-stereo pair of images. The calculated heights for sample buildings using cross-sensor data fusion algorithm show an average coefficient of variation 1.03%. In order to infer structural-type and occupancy-type, i.e. engineering attributes, of buildings from spatial and geometric attributes of 3-D models, a statistical data analysis framework is formulated. Applications of "Classification Trees" and "Multinomial Logistic Models" in modeling the marginal probabilities of class-membership of engineering attributes are investigated. Adaptive statistical models to incorporate different spatial and geometric attributes of buildings---while inferring the engineering attributes---are developed in this dissertation. The inferred engineering attributes in conjunction with the spatial and geometric attributes derived from the imagery can be used to augment regional building inventories and therefore enhance the result of catastrophe models. In the last part of the dissertation, a set of empirically-derived motion-damage relationships based on the correlation of observed building performance with measured ground-motion parameters from 1994 Northridge and 1999 Chi-Chi Taiwan earthquakes are developed. Fragility functions in the form of cumulative lognormal distributions and damage probability matrices for several classes of buildings (wood, steel and concrete), as well as number of ground-motion intensity measures are developed and compared to currently-used motion-damage relationships.

  2. Iterative algorithm of discrete Fourier transform for processing randomly sampled NMR data sets

    International Nuclear Information System (INIS)

    Stanek, Jan; Kozminski, Wiktor

    2010-01-01

    Spectra obtained by application of multidimensional Fourier Transformation (MFT) to sparsely sampled nD NMR signals are usually corrupted due to missing data. In the present paper this phenomenon is investigated on simulations and experiments. An effective iterative algorithm for artifact suppression for sparse on-grid NMR data sets is discussed in detail. It includes automated peak recognition based on statistical methods. The results enable one to study NMR spectra of high dynamic range of peak intensities preserving benefits of random sampling, namely the superior resolution in indirectly measured dimensions. Experimental examples include 3D 15 N- and 13 C-edited NOESY-HSQC spectra of human ubiquitin.

  3. RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

    Science.gov (United States)

    Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

    2017-06-06

    An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

  4. Examination of statistical noise in SPECT image and sampling pitch

    International Nuclear Information System (INIS)

    Takaki, Akihiro; Soma, Tsutomu; Murase, Kenya; Watanabe, Hiroyuki; Murakami, Tomonori; Kawakami, Kazunori; Teraoka, Satomi; Kojima, Akihiro; Matsumoto, Masanori

    2008-01-01

    Statistical noise in single photon emission computed tomography (SPECT) image was examined for its relation with total count and with sampling pitch by simulation and phantom experiment to obtain their projection data under defined conditions. The former SPECT simulation was performed on assumption of a virtual, homogeneous water column (20 cm diameter) as an absorbing mass. In the latter, used were 3D-Hoffman brain phantom (Data Spectrum Corp.) filled with 370 MBq of 99m Tc-pertechnetate solution and a facing 2-detector SPECT machine with a low-energy/high-resolution collimator, E-CAM (Siemens). Projected data by the two methods were reconstructed through the filtered back projection to make each transaxial image. The noise was evaluated by vision, by their root mean square uncertainty calculated from average count and standard deviation (SD) in the region of interest (ROI) defined in reconstructed images and by normalized mean squares calculated from the difference between the reference image obtained with common sampling pitch to and all of obtained slices of, the simulation and phantom. As a conclusion, the pitch was recommended to be set in the machine as to approximating the value calculated by the sampling theorem, though the projection counts per one angular direction were smaller with the same total time of data acquisition. (R.T.)

  5. Adaptive sampling rate control for networked systems based on statistical characteristics of packet disordering.

    Science.gov (United States)

    Li, Jin-Na; Er, Meng-Joo; Tan, Yen-Kheng; Yu, Hai-Bin; Zeng, Peng

    2015-09-01

    This paper investigates an adaptive sampling rate control scheme for networked control systems (NCSs) subject to packet disordering. The main objectives of the proposed scheme are (a) to avoid heavy packet disordering existing in communication networks and (b) to stabilize NCSs with packet disordering, transmission delay and packet loss. First, a novel sampling rate control algorithm based on statistical characteristics of disordering entropy is proposed; secondly, an augmented closed-loop NCS that consists of a plant, a sampler and a state-feedback controller is transformed into an uncertain and stochastic system, which facilitates the controller design. Then, a sufficient condition for stochastic stability in terms of Linear Matrix Inequalities (LMIs) is given. Moreover, an adaptive tracking controller is designed such that the sampling period tracks a desired sampling period, which represents a significant contribution. Finally, experimental results are given to illustrate the effectiveness and advantages of the proposed scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  6. Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis

    Energy Technology Data Exchange (ETDEWEB)

    Vrbik, Jan [Department of Mathematics, Brock University, St. Catharines, Ontario L2S 3A1 (Canada); Ospadov, Egor; Rothstein, Stuart M., E-mail: srothstein@brocku.ca [Department of Physics, Brock University, St. Catharines, Ontario L2S 3A1 (Canada)

    2016-07-14

    Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x′; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.

  7. Note: A pure-sampling quantum Monte Carlo algorithm with independent Metropolis

    International Nuclear Information System (INIS)

    Vrbik, Jan; Ospadov, Egor; Rothstein, Stuart M.

    2016-01-01

    Recently, Ospadov and Rothstein published a pure-sampling quantum Monte Carlo algorithm (PSQMC) that features an auxiliary Path Z that connects the midpoints of the current and proposed Paths X and Y, respectively. When sufficiently long, Path Z provides statistical independence of Paths X and Y. Under those conditions, the Metropolis decision used in PSQMC is done without any approximation, i.e., not requiring microscopic reversibility and without having to introduce any G(x → x′; τ) factors into its decision function. This is a unique feature that contrasts with all competing reptation algorithms in the literature. An example illustrates that dependence of Paths X and Y has adverse consequences for pure sampling.

  8. DSMC multicomponent aerosol dynamics: Sampling algorithms and aerosol processes

    Science.gov (United States)

    Palaniswaamy, Geethpriya

    The post-accident nuclear reactor primary and containment environments can be characterized by high temperatures and pressures, and fission products and nuclear aerosols. These aerosols evolve via natural transport processes as well as under the influence of engineered safety features. These aerosols can be hazardous and may pose risk to the public if released into the environment. Computations of their evolution, movement and distribution involve the study of various processes such as coagulation, deposition, condensation, etc., and are influenced by factors such as particle shape, charge, radioactivity and spatial inhomogeneity. These many factors make the numerical study of nuclear aerosol evolution computationally very complicated. The focus of this research is on the use of the Direct Simulation Monte Carlo (DSMC) technique to elucidate the role of various phenomena that influence the nuclear aerosol evolution. In this research, several aerosol processes such as coagulation, deposition, condensation, and source reinforcement are explored for a multi-component, aerosol dynamics problem in a spatially homogeneous medium. Among the various sampling algorithms explored the Metropolis sampling algorithm was found to be effective and fast. Several test problems and test cases are simulated using the DSMC technique. The DSMC results obtained are verified against the analytical and sectional results for appropriate test problems. Results show that the assumption of a single mean density is not appropriate due to the complicated effect of component densities on the aerosol processes. The methods developed and the insights gained will also be helpful in future research on the challenges associated with the description of fission product and aerosol releases.

  9. Improved Sampling Algorithms in the Risk-Informed Safety Margin Characterization Toolkit

    Energy Technology Data Exchange (ETDEWEB)

    Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Smith, Curtis Lee [Idaho National Lab. (INL), Idaho Falls, ID (United States); Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States); Cogliati, Joshua Joseph [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2015-09-01

    The RISMC approach is developing advanced set of methodologies and algorithms in order to perform Probabilistic Risk Analyses (PRAs). In contrast to classical PRA methods, which are based on Event-Tree and Fault-Tree methods, the RISMC approach largely employs system simulator codes applied to stochastic analysis tools. The basic idea is to randomly perturb (by employing sampling algorithms) timing and sequencing of events and internal parameters of the system codes (i.e., uncertain parameters) in order to estimate stochastic parameters such as core damage probability. This approach applied to complex systems such as nuclear power plants requires to perform a series of computationally expensive simulation runs given a large set of uncertain parameters. These types of analysis are affected by two issues. Firstly, the space of the possible solutions (a.k.a., the issue space or the response surface) can be sampled only very sparsely, and this precludes the ability to fully analyze the impact of uncertainties on the system dynamics. Secondly, large amounts of data are generated and tools to generate knowledge from such data sets are not yet available. This report focuses on the first issue and in particular employs novel methods that optimize the information generated by the sampling process by sampling unexplored and risk-significant regions of the issue space: adaptive (smart) sampling algorithms. They infer system response from surrogate models constructed from existing samples and predict the most relevant location of the next sample. It is therefore possible to understand features of the issue space with a small number of carefully selected samples. In this report, we will present how it is possible to perform adaptive sampling using the RISMC toolkit and highlight the advantages compared to more classical sampling approaches such Monte-Carlo. We will employ RAVEN to perform such statistical analyses using both analytical cases but also another RISMC code: RELAP-7.

  10. Improved Sampling Algorithms in the Risk-Informed Safety Margin Characterization Toolkit

    International Nuclear Information System (INIS)

    Mandelli, Diego; Smith, Curtis Lee; Alfonsi, Andrea; Rabiti, Cristian; Cogliati, Joshua Joseph

    2015-01-01

    The RISMC approach is developing advanced set of methodologies and algorithms in order to perform Probabilistic Risk Analyses (PRAs). In contrast to classical PRA methods, which are based on Event-Tree and Fault-Tree methods, the RISMC approach largely employs system simulator codes applied to stochastic analysis tools. The basic idea is to randomly perturb (by employing sampling algorithms) timing and sequencing of events and internal parameters of the system codes (i.e., uncertain parameters) in order to estimate stochastic parameters such as core damage probability. This approach applied to complex systems such as nuclear power plants requires to perform a series of computationally expensive simulation runs given a large set of uncertain parameters. These types of analysis are affected by two issues. Firstly, the space of the possible solutions (a.k.a., the issue space or the response surface) can be sampled only very sparsely, and this precludes the ability to fully analyze the impact of uncertainties on the system dynamics. Secondly, large amounts of data are generated and tools to generate knowledge from such data sets are not yet available. This report focuses on the first issue and in particular employs novel methods that optimize the information generated by the sampling process by sampling unexplored and risk-significant regions of the issue space: adaptive (smart) sampling algorithms. They infer system response from surrogate models constructed from existing samples and predict the most relevant location of the next sample. It is therefore possible to understand features of the issue space with a small number of carefully selected samples. In this report, we will present how it is possible to perform adaptive sampling using the RISMC toolkit and highlight the advantages compared to more classical sampling approaches such Monte-Carlo. We will employ RAVEN to perform such statistical analyses using both analytical cases but also another RISMC code: RELAP-7.

  11. Statistical sampling plan for the TRU waste assay facility

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Wright, T.; Schultz, F.J.; Haff, K.; Monroe, R.J.

    1983-08-01

    Due to limited space, there is a need to dispose appropriately of the Oak Ridge National Laboratory transuranic waste which is presently stored below ground in 55-gal (208-l) drums within weather-resistant structures. Waste containing less than 100 nCi/g transuranics can be removed from the present storage and be buried, while waste containing greater than 100 nCi/g transuranics must continue to be retrievably stored. To make the necessary measurements needed to determine the drums that can be buried, a transuranic Neutron Interrogation Assay System (NIAS) has been developed at Los Alamos National Laboratory and can make the needed measurements much faster than previous techniques which involved γ-ray spectroscopy. The previous techniques are reliable but time consuming. Therefore, a validation study has been planned to determine the ability of the NIAS to make adequate measurements. The validation of the NIAS will be based on a paired comparison of a sample of measurements made by the previous techniques and the NIAS. The purpose of this report is to describe the proposed sampling plan and the statistical analyses needed to validate the NIAS. 5 references, 4 figures, 5 tables

  12. A Separation Algorithm for Sources with Temporal Structure Only Using Second-order Statistics

    Directory of Open Access Journals (Sweden)

    J.G. Wang

    2013-09-01

    Full Text Available Unlike conventional blind source separation (BSS deals with independent identically distributed (i.i.d. sources, this paper addresses the separation from mixtures of sources with temporal structure, such as linear autocorrelations. Many sequential extraction algorithms have been reported, resulting in inevitable cumulated errors introduced by the deflation scheme. We propose a robust separation algorithm to recover original sources simultaneously, through a joint diagonalizer of several average delayed covariance matrices at positions of the optimal time delay and its integers. The proposed algorithm is computationally simple and efficient, since it is based on the second-order statistics only. Extensive simulation results confirm the validity and high performance of the algorithm. Compared with related extraction algorithms, its separation signal-to-noise rate for a desired source can reach 20dB higher, and it seems rather insensitive to the estimation error of the time delay.

  13. An Efficient Forward-Reverse EM Algorithm for Statistical Inference in Stochastic Reaction Networks

    KAUST Repository

    Bayer, Christian

    2016-01-06

    In this work [1], we present an extension of the forward-reverse algorithm by Bayer and Schoenmakers [2] to the context of stochastic reaction networks (SRNs). We then apply this bridge-generation technique to the statistical inference problem of approximating the reaction coefficients based on discretely observed data. To this end, we introduce an efficient two-phase algorithm in which the first phase is deterministic and it is intended to provide a starting point for the second phase which is the Monte Carlo EM Algorithm.

  14. Statistical trajectory of an approximate EM algorithm for probabilistic image processing

    International Nuclear Information System (INIS)

    Tanaka, Kazuyuki; Titterington, D M

    2007-01-01

    We calculate analytically a statistical average of trajectories of an approximate expectation-maximization (EM) algorithm with generalized belief propagation (GBP) and a Gaussian graphical model for the estimation of hyperparameters from observable data in probabilistic image processing. A statistical average with respect to observed data corresponds to a configuration average for the random-field Ising model in spin glass theory. In the present paper, hyperparameters which correspond to interactions and external fields of spin systems are estimated by an approximate EM algorithm. A practical algorithm is described for gray-level image restoration based on a Gaussian graphical model and GBP. The GBP approach corresponds to the cluster variation method in statistical mechanics. Our main result in the present paper is to obtain the statistical average of the trajectory in the approximate EM algorithm by using loopy belief propagation and GBP with respect to degraded images generated from a probability density function with true values of hyperparameters. The statistical average of the trajectory can be expressed in terms of recursion formulas derived from some analytical calculations

  15. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Gregory D. [University of California, Physics and Astronomy Department, Los Angeles, CA (United States); McKay, James; Scott, Pat [Imperial College London, Department of Physics, Blackett Laboratory, London (United Kingdom); Farmer, Ben; Conrad, Jan [AlbaNova University Centre, Oskar Klein Centre for Cosmoparticle Physics, Stockholm (Sweden); Stockholm University, Department of Physics, Stockholm (Sweden); Roebber, Elinore [McGill University, Department of Physics, Montreal, QC (Canada); Putze, Antje [LAPTh, Universite de Savoie, CNRS, Annecy-le-Vieux (France); Collaboration: The GAMBIT Scanner Workgroup

    2017-11-15

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics. (orig.)

  16. Phase Transitions in Combinatorial Optimization Problems Basics, Algorithms and Statistical Mechanics

    CERN Document Server

    Hartmann, Alexander K

    2005-01-01

    A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary

  17. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

    OpenAIRE

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of...

  18. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    Science.gov (United States)

    Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  19. Some algorithms for reordering a sequence of objects, with application to E. Sparre Andersen's principle of equivalence in mathematical statistics

    NARCIS (Netherlands)

    Bruijn, de N.G.

    1972-01-01

    Recently A. W. Joseph described an algorithm providing combinatorial insight into E. Sparre Andersen's so-called Principle of Equivalence in mathematical statistics. In the present paper such algorithms are discussed systematically.

  20. Application of image recognition algorithms for statistical description of nano- and microstructured surfaces

    Energy Technology Data Exchange (ETDEWEB)

    Mărăscu, V.; Dinescu, G. [National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, Bucharest– Magurele (Romania); Faculty of Physics, University of Bucharest, 405 Atomistilor Street, Bucharest-Magurele (Romania); Chiţescu, I. [Faculty of Mathematics and Computer Science, University of Bucharest, 14 Academiei Street, Bucharest (Romania); Barna, V. [Faculty of Physics, University of Bucharest, 405 Atomistilor Street, Bucharest-Magurele (Romania); Ioniţă, M. D.; Lazea-Stoyanova, A.; Mitu, B., E-mail: mitub@infim.ro [National Institute for Lasers, Plasma and Radiation Physics, 409 Atomistilor Street, Bucharest– Magurele (Romania)

    2016-03-25

    In this paper we propose a statistical approach for describing the self-assembling of sub-micronic polystyrene beads on silicon surfaces, as well as the evolution of surface topography due to plasma treatments. Algorithms for image recognition are used in conjunction with Scanning Electron Microscopy (SEM) imaging of surfaces. In a first step, greyscale images of the surface covered by the polystyrene beads are obtained. Further, an adaptive thresholding method was applied for obtaining binary images. The next step consisted in automatic identification of polystyrene beads dimensions, by using Hough transform algorithm, according to beads radius. In order to analyze the uniformity of the self–assembled polystyrene beads, the squared modulus of 2-dimensional Fast Fourier Transform (2- D FFT) was applied. By combining these algorithms we obtain a powerful and fast statistical tool for analysis of micro and nanomaterials with aspect features regularly distributed on surface upon SEM examination.

  1. Application of image recognition algorithms for statistical description of nano- and microstructured surfaces

    International Nuclear Information System (INIS)

    Mărăscu, V.; Dinescu, G.; Chiţescu, I.; Barna, V.; Ioniţă, M. D.; Lazea-Stoyanova, A.; Mitu, B.

    2016-01-01

    In this paper we propose a statistical approach for describing the self-assembling of sub-micronic polystyrene beads on silicon surfaces, as well as the evolution of surface topography due to plasma treatments. Algorithms for image recognition are used in conjunction with Scanning Electron Microscopy (SEM) imaging of surfaces. In a first step, greyscale images of the surface covered by the polystyrene beads are obtained. Further, an adaptive thresholding method was applied for obtaining binary images. The next step consisted in automatic identification of polystyrene beads dimensions, by using Hough transform algorithm, according to beads radius. In order to analyze the uniformity of the self–assembled polystyrene beads, the squared modulus of 2-dimensional Fast Fourier Transform (2- D FFT) was applied. By combining these algorithms we obtain a powerful and fast statistical tool for analysis of micro and nanomaterials with aspect features regularly distributed on surface upon SEM examination.

  2. An algorithm to improve sampling efficiency for uncertainty propagation using sampling based method

    International Nuclear Information System (INIS)

    Campolina, Daniel; Lima, Paulo Rubens I.; Pereira, Claubia; Veloso, Maria Auxiliadora F.

    2015-01-01

    Sample size and computational uncertainty were varied in order to investigate sample efficiency and convergence of the sampling based method for uncertainty propagation. Transport code MCNPX was used to simulate a LWR model and allow the mapping, from uncertain inputs of the benchmark experiment, to uncertain outputs. Random sampling efficiency was improved through the use of an algorithm for selecting distributions. Mean range, standard deviation range and skewness were verified in order to obtain a better representation of uncertainty figures. Standard deviation of 5 pcm in the propagated uncertainties for 10 n-samples replicates was adopted as convergence criterion to the method. Estimation of 75 pcm uncertainty on reactor k eff was accomplished by using sample of size 93 and computational uncertainty of 28 pcm to propagate 1σ uncertainty of burnable poison radius. For a fixed computational time, in order to reduce the variance of the uncertainty propagated, it was found, for the example under investigation, it is preferable double the sample size than double the amount of particles followed by Monte Carlo process in MCNPX code. (author)

  3. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    Science.gov (United States)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a

  4. An Intrinsic Algorithm for Parallel Poisson Disk Sampling on Arbitrary Surfaces.

    Science.gov (United States)

    Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

    2013-03-08

    Poisson disk sampling plays an important role in a variety of visual computing, due to its useful statistical property in distribution and the absence of aliasing artifacts. While many effective techniques have been proposed to generate Poisson disk distribution in Euclidean space, relatively few work has been reported to the surface counterpart. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. We propose a new technique for parallelizing the dart throwing. Rather than the conventional approaches that explicitly partition the spatial domain to generate the samples in parallel, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. It is worth noting that our algorithm is accurate as the generated Poisson disks are uniformly and randomly distributed without bias. Our method is intrinsic in that all the computations are based on the intrinsic metric and are independent of the embedding space. This intrinsic feature allows us to generate Poisson disk distributions on arbitrary surfaces. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.

  5. Empirical and Statistical Evaluation of the Effectiveness of Four Lossless Data Compression Algorithms

    Directory of Open Access Journals (Sweden)

    N. A. Azeez

    2017-04-01

    Full Text Available Data compression is the process of reducing the size of a file to effectively reduce storage space and communication cost. The evolvement in technology and digital age has led to an unparalleled usage of digital files in this current decade. The usage of data has resulted to an increase in the amount of data being transmitted via various channels of data communication which has prompted the need to look into the current lossless data compression algorithms to check for their level of effectiveness so as to maximally reduce the bandwidth requirement in communication and transfer of data. Four lossless data compression algorithm: Lempel-Ziv Welch algorithm, Shannon-Fano algorithm, Adaptive Huffman algorithm and Run-Length encoding have been selected for implementation. The choice of these algorithms was based on their similarities, particularly in application areas. Their level of efficiency and effectiveness were evaluated using some set of predefined performance evaluation metrics namely compression ratio, compression factor, compression time, saving percentage, entropy and code efficiency. The algorithms implementation was done in the NetBeans Integrated Development Environment using Java as the programming language. Through the statistical analysis performed using Boxplot and ANOVA and comparison made on the four algo

  6. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.

    2017-08-21

    Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,” has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer

  7. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    International Nuclear Information System (INIS)

    Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems

  8. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    Energy Technology Data Exchange (ETDEWEB)

    Elsheikh, Ahmed H., E-mail: aelsheikh@ices.utexas.edu [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Wheeler, Mary F. [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Hoteit, Ibrahim [Department of Earth Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal (Saudi Arabia)

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.

  9. Recent advances in importance sampling for statistical model checking

    NARCIS (Netherlands)

    Reijsbergen, D.P.; de Boer, Pieter-Tjerk; Scheinhardt, Willem R.W.; Haverkort, Boudewijn R.H.M.

    2013-01-01

    In the following work we present an overview of recent advances in rare event simulation for model checking made at the University of Twente. The overview is divided into the several model classes for which we propose algorithms, namely multicomponent systems, Markov chains and stochastic Petri

  10. Quantitative Imaging Biomarkers: A Review of Statistical Methods for Computer Algorithm Comparisons

    Science.gov (United States)

    2014-01-01

    Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research. PMID:24919829

  11. Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons.

    Science.gov (United States)

    Obuchowski, Nancy A; Reeves, Anthony P; Huang, Erich P; Wang, Xiao-Feng; Buckler, Andrew J; Kim, Hyun J Grace; Barnhart, Huiman X; Jackson, Edward F; Giger, Maryellen L; Pennello, Gene; Toledano, Alicia Y; Kalpathy-Cramer, Jayashree; Apanasovich, Tatiyana V; Kinahan, Paul E; Myers, Kyle J; Goldgof, Dmitry B; Barboriak, Daniel P; Gillies, Robert J; Schwartz, Lawrence H; Sullivan, Daniel C

    2015-02-01

    Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  12. The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Jianning Wu

    2015-01-01

    Full Text Available The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.

  13. The novel quantitative technique for assessment of gait symmetry using advanced statistical learning algorithm.

    Science.gov (United States)

    Wu, Jianning; Wu, Bin

    2015-01-01

    The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.

  14. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    Science.gov (United States)

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  15. Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

    KAUST Repository

    Elsheikh, A. H.

    2013-12-01

    Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known as nested sampling (NS), which can simultaneously sample the posterior distribution for uncertainty quantification, and estimate the Bayesian evidence for model selection. Model selection statistics, such as the Bayesian evidence, are needed to choose or assign different weights to different models of different levels of complexities. In this work, we report the first successful application of nested sampling for calibration of several nonlinear subsurface flow problems. The estimated Bayesian evidence by the NS algorithm is used to weight different parameterizations of the subsurface flow models (prior model selection). The results of the numerical evaluation implicitly enforced Occam\\'s razor where simpler models with fewer number of parameters are favored over complex models. The proper level of model complexity was automatically determined based on the information content of the calibration data and the data mismatch of the calibrated model.

  16. Improvement of characteristic statistic algorithm and its application on equilibrium cycle reloading optimization

    International Nuclear Information System (INIS)

    Hu, Y.; Liu, Z.; Shi, X.; Wang, B.

    2006-01-01

    A brief introduction of characteristic statistic algorithm (CSA) is given in the paper, which is a new global optimization algorithm to solve the problem of PWR in-core fuel management optimization. CSA is modified by the adoption of back propagation neural network and fast local adjustment. Then the modified CSA is applied to PWR Equilibrium Cycle Reloading Optimization, and the corresponding optimization code of CSA-DYW is developed. CSA-DYW is used to optimize the equilibrium cycle of 18 month reloading of Daya bay nuclear plant Unit 1 reactor. The results show that CSA-DYW has high efficiency and good global performance on PWR Equilibrium Cycle Reloading Optimization. (authors)

  17. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Scheid Anika

    2012-07-01

    Full Text Available Abstract Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent stochastic context-free grammar (SCFG that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples, where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones, then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst

  18. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  19. A rank-based algorithm of differential expression analysis for small cell line data with statistical control.

    Science.gov (United States)

    Li, Xiangyu; Cai, Hao; Wang, Xianlong; Ao, Lu; Guo, You; He, Jun; Gu, Yunyan; Qi, Lishuang; Guan, Qingzhou; Lin, Xu; Guo, Zheng

    2017-10-13

    To detect differentially expressed genes (DEGs) in small-scale cell line experiments, usually with only two or three technical replicates for each state, the commonly used statistical methods such as significance analysis of microarrays (SAM), limma and RankProd (RP) lack statistical power, while the fold change method lacks any statistical control. In this study, we demonstrated that the within-sample relative expression orderings (REOs) of gene pairs were highly stable among technical replicates of a cell line but often widely disrupted after certain treatments such like gene knockdown, gene transfection and drug treatment. Based on this finding, we customized the RankComp algorithm, previously designed for individualized differential expression analysis through REO comparison, to identify DEGs with certain statistical control for small-scale cell line data. In both simulated and real data, the new algorithm, named CellComp, exhibited high precision with much higher sensitivity than the original RankComp, SAM, limma and RP methods. Therefore, CellComp provides an efficient tool for analyzing small-scale cell line data. © The Author 2017. Published by Oxford University Press.

  20. Multiscale Monte Carlo algorithms in statistical mechanics and quantum field theory

    Energy Technology Data Exchange (ETDEWEB)

    Lauwers, P G

    1990-12-01

    Conventional Monte Carlo simulation algorithms for models in statistical mechanics and quantum field theory are afflicted by problems caused by their locality. They become highly inefficient if investigations of critical or nearly-critical systems, i.e., systems with important large scale phenomena, are undertaken. We present two types of multiscale approaches that alleveate problems of this kind: Stochastic cluster algorithms and multigrid Monte Carlo simulation algorithms. Another formidable computational problem in simulations of phenomenologically relevant field theories with fermions is the need for frequently inverting the Dirac operator. This inversion can be accelerated considerably by means of deterministic multigrid methods, very similar to the ones used for the numerical solution of differential equations. (orig.).

  1. Phase Transitions in Combinatorial Optimization Problems: Basics, Algorithms and Statistical Mechanics

    Science.gov (United States)

    Hartmann, Alexander K.; Weigt, Martin

    2005-10-01

    A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary basics in required detail. Throughout, the algorithms are shown with examples and calculations, while the proofs are given in a way suitable for graduate students, post-docs, and researchers. Ideal for newcomers to this young, multidisciplinary field.

  2. Grasp Algorithms For Optotactile Robotic Sample Acquisition, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — Robotic sample acquisition is basically grasping. Multi-finger robot sample grasping devices are controlled to securely pick up samples. While optimal grasps for...

  3. Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

    NARCIS (Netherlands)

    Wiel, van de M.A.

    1998-01-01

    We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for

  4. Statistical techniques for sampling and monitoring natural resources

    Science.gov (United States)

    Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

    2004-01-01

    We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....

  5. Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    OpenAIRE

    Fischer, Bernd; Knuth, Kevin; Hajian, Arsen; Schumann, Johann

    2004-01-01

    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable buildin...

  6. Statistical image reconstruction for transmission tomography using relaxed ordered subset algorithms

    International Nuclear Information System (INIS)

    Kole, J S

    2005-01-01

    Statistical reconstruction methods offer possibilities for improving image quality as compared to analytical methods, but current reconstruction times prohibit routine clinical applications in x-ray computed tomography (CT). To reduce reconstruction times, we have applied (under) relaxation to ordered subset algorithms. This enables us to use subsets consisting of only single projection angle, effectively increasing the number of image updates within an entire iteration. A second advantage of applying relaxation is that it can help improve convergence by removing the limit cycle behaviour of ordered subset algorithms, which normally do not converge to an optimal solution but rather a suboptimal limit cycle consisting of as many points as there are subsets. Relaxation suppresses the limit cycle behaviour by decreasing the stepsize for approaching the solution. A simulation study for a 2D mathematical phantom and three different ordered subset algorithms shows that all three algorithms benefit from relaxation: equal noise-to-resolution trade-off can be achieved using fewer iterations than the conventional algorithms, while a lower minimal normalized mean square error (NMSE) clearly indicates a better convergence. Two different schemes for setting the relaxation parameter are studied, and both schemes yield approximately the same minimal NMSE

  7. Comparison of Statistical Algorithms for the Detection of Infectious Disease Outbreaks in Large Multiple Surveillance Systems

    Science.gov (United States)

    Farrington, C. Paddy; Noufaily, Angela; Andrews, Nick J.; Charlett, Andre

    2016-01-01

    A large-scale multiple surveillance system for infectious disease outbreaks has been in operation in England and Wales since the early 1990s. Changes to the statistical algorithm at the heart of the system were proposed and the purpose of this paper is to compare two new algorithms with the original algorithm. Test data to evaluate performance are created from weekly counts of the number of cases of each of more than 2000 diseases over a twenty-year period. The time series of each disease is separated into one series giving the baseline (background) disease incidence and a second series giving disease outbreaks. One series is shifted forward by twelve months and the two are then recombined, giving a realistic series in which it is known where outbreaks have been added. The metrics used to evaluate performance include a scoring rule that appropriately balances sensitivity against specificity and is sensitive to variation in probabilities near 1. In the context of disease surveillance, a scoring rule can be adapted to reflect the size of outbreaks and this was done. Results indicate that the two new algorithms are comparable to each other and better than the algorithm they were designed to replace. PMID:27513749

  8. Audit sampling: A qualitative study on the role of statistical and non-statistical sampling approaches on audit practices in Sweden

    OpenAIRE

    Ayam, Rufus Tekoh

    2011-01-01

    PURPOSE: The two approaches to audit sampling; statistical and nonstatistical have been examined in this study. The overall purpose of the study is to explore the current extent at which statistical and nonstatistical sampling approaches are utilized by independent auditors during auditing practices. Moreover, the study also seeks to achieve two additional purposes; the first is to find out whether auditors utilize different sampling techniques when auditing SME´s (Small and Medium-Sized Ente...

  9. A Markov chain Monte Carlo Expectation Maximization Algorithm for Statistical Analysis of DNA Sequence Evolution with Neighbor-Dependent Substitution Rates

    DEFF Research Database (Denmark)

    Hobolth, Asger

    2008-01-01

    -dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed......The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor......-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high...

  10. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    Science.gov (United States)

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  11. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    Science.gov (United States)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to

  12. Evaluation of dynamically dimensioned search algorithm for optimizing SWAT by altering sampling distributions and searching range

    Science.gov (United States)

    The primary advantage of Dynamically Dimensioned Search algorithm (DDS) is that it outperforms many other optimization techniques in both convergence speed and the ability in searching for parameter sets that satisfy statistical guidelines while requiring only one algorithm parameter (perturbation f...

  13. Statistical sampling strategies for survey of soil contamination

    NARCIS (Netherlands)

    Brus, D.J.

    2011-01-01

    This chapter reviews methods for selecting sampling locations in contaminated soils for three situations. In the first situation a global estimate of the soil contamination in an area is required. The result of the surey is a number or a series of numbers per contaminant, e.g. the estimated mean

  14. Small sample approach, and statistical and epidemiological aspects

    NARCIS (Netherlands)

    Offringa, Martin; van der Lee, Hanneke

    2011-01-01

    In this chapter, the design of pharmacokinetic studies and phase III trials in children is discussed. Classical approaches and relatively novel approaches, which may be more useful in the context of drug research in children, are discussed. The burden of repeated blood sampling in pediatric

  15. Statistical evaluations of current sampling procedures and incomplete core recovery

    International Nuclear Information System (INIS)

    Heasler, P.G.; Jensen, L.

    1994-03-01

    This document develops two formulas that describe the effects of incomplete recovery on core sampling results for the Hanford waste tanks. The formulas evaluate incomplete core recovery from a worst-case (i.e.,biased) and best-case (i.e., unbiased) perspective. A core sampler is unbiased if the sample material recovered is a random sample of the material in the tank, while any sampler that preferentially recovers a particular type of waste over others is a biased sampler. There is strong evidence to indicate that the push-mode sampler presently used at the Hanford site is a biased one. The formulas presented here show the effects of incomplete core recovery on the accuracy of composition measurements, as functions of the vertical variability in the waste. These equations are evaluated using vertical variability estimates from previously sampled tanks (B110, U110, C109). Assuming that the values of vertical variability used in this study adequately describes the Hanford tank farm, one can use the formulas to compute the effect of incomplete recovery on the accuracy of an average constituent estimate. To determine acceptable recovery limits, we have assumed that the relative error of such an estimate should be no more than 20%

  16. Determination of Optimal Double Sampling Plan using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Sampath Sundaram

    2012-03-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Designing double sampling plan requires identification of sample sizes and acceptance numbers. In this paper a genetic algorithm has been designed for the selection of optimal acceptance numbers and sample sizes for the specified producer’s risk and consumer’s risk. Implementation of the algorithm has been illustrated numerically for different choices of quantities involved in a double sampling plan   

  1. A Matlab user interface for the statistically assisted fluid registration algorithm and tensor-based morphometry

    Science.gov (United States)

    Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha

    2015-01-01

    Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.

  2. Algorithm for statistical noise reduction in three-dimensional ion implant simulations

    International Nuclear Information System (INIS)

    Hernandez-Mangas, J.M.; Arias, J.; Jaraiz, M.; Bailon, L.; Barbolla, J.

    2001-01-01

    As integrated circuit devices scale into the deep sub-micron regime, ion implantation will continue to be the primary means of introducing dopant atoms into silicon. Different types of impurity profiles such as ultra-shallow profiles and retrograde profiles are necessary for deep submicron devices in order to realize the desired device performance. A new algorithm to reduce the statistical noise in three-dimensional ion implant simulations both in the lateral and shallow/deep regions of the profile is presented. The computational effort in BCA Monte Carlo ion implant simulation is also reduced

  3. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    International Nuclear Information System (INIS)

    Enqvist, Andreas

    2008-03-01

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible sample

  4. The Statistics of Emission and Detection of Neutrons and Photons from Fissile Samples for Safeguard Applications

    Energy Technology Data Exchange (ETDEWEB)

    Enqvist, Andreas

    2008-03-15

    One particular purpose of nuclear safeguards, in addition to accounting for known materials, is the detection, identifying and quantifying unknown material, to prevent accidental and clandestine transports and uses of nuclear materials. This can be achieved in a non-destructive way through the various physical and statistical properties of particle emission and detection from such materials. This thesis addresses some fundamental aspects of nuclear materials and the way they can be detected and quantified by such methods. Factorial moments or multiplicities have long been used within the safeguard area. These are low order moments of the underlying number distributions of emission and detection. One objective of the present work was to determine the full probability distribution and its dependence on the sample mass and the detection process. Derivation and analysis of the full probability distribution and its dependence on the above factors constitutes the first part of the thesis. Another possibility of identifying unknown samples lies in the information in the 'fingerprints' (pulse shape distribution) left by a detected neutron or photon. A study of the statistical properties of the interaction of the incoming radiation (neutrons and photons) with the detectors constitutes the second part of the thesis. The interaction between fast neutrons and organic scintillation detectors is derived, and compared to Monte Carlo simulations. An experimental approach is also addressed in which cross correlation measurements were made using liquid scintillation detectors. First the dependence of the pulse height distribution on the energy and collision number of an incoming neutron was derived analytically and compared to numerical simulations. Then an algorithm was elaborated which can discriminate neutron pulses from photon pulses. The resulting cross correlation graphs are analyzed and discussed whether they can be used in applications to distinguish possible

  5. Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm.

    Science.gov (United States)

    Kamiura, Moto; Sano, Kohei

    2017-10-01

    The principle of optimism in the face of uncertainty is known as a heuristic in sequential decision-making problems. Overtaking method based on this principle is an effective algorithm to solve multi-armed bandit problems. It was defined by a set of some heuristic patterns of the formulation in the previous study. The objective of the present paper is to redefine the value functions of Overtaking method and to unify the formulation of them. The unified Overtaking method is associated with upper bounds of confidence intervals of expected rewards on statistics. The unification of the formulation enhances the universality of Overtaking method. Consequently we newly obtain Overtaking method for the exponentially distributed rewards, numerically analyze it, and show that it outperforms UCB algorithm on average. The present study suggests that the principle of optimism in the face of uncertainty should be regarded as the statistics-based consequence of the law of large numbers for the sample mean of rewards and estimation of upper bounds of expected rewards, rather than as a heuristic, in the context of multi-armed bandit problems. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    Energy Technology Data Exchange (ETDEWEB)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.; Amidan, Brett G.

    2013-04-27

    This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that a decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account

  7. Automatic Derivation of Statistical Data Analysis Algorithms: Planetary Nebulae and Beyond

    Science.gov (United States)

    Fischer, Bernd; Hajian, Arsen; Knuth, Kevin; Schumann, Johann

    2004-04-01

    AUTOBAYES is a fully automatic program synthesis system for the data analysis domain. Its input is a declarative problem description in form of a statistical model; its output is documented and optimized C/C++ code. The synthesis process relies on the combination of three key techniques. Bayesian networks are used as a compact internal representation mechanism which enables problem decompositions and guides the algorithm derivation. Program schemas are used as independently composable building blocks for the algorithm construction; they can encapsulate advanced algorithms and data structures. A symbolic-algebraic system is used to find closed-form solutions for problems and emerging subproblems. In this paper, we describe the application of AUTOBAYES to the analysis of planetary nebulae images taken by the Hubble Space Telescope. We explain the system architecture, and present in detail the automatic derivation of the scientists' original analysis as well as a refined analysis using clustering models. This study demonstrates that AUTOBAYES is now mature enough so that it can be applied to realistic scientific data analysis tasks.

  8. A Statistical Algorithm for Estimating Chlorophyll Concentration in the New Caledonian Lagoon

    Directory of Open Access Journals (Sweden)

    Guillaume Wattelez

    2016-01-01

    Full Text Available Spatial and temporal dynamics of phytoplankton biomass and water turbidity can provide crucial information about the function, health and vulnerability of lagoon ecosystems (coral reefs, sea grasses, etc.. A statistical algorithm is proposed to estimate chlorophyll-a concentration ([chl-a] in optically complex waters of the New Caledonian lagoon from MODIS-derived “remote-sensing” reflectance (Rrs. The algorithm is developed via supervised learning on match-ups gathered from 2002 to 2010. The best performance is obtained by combining two models, selected according to the ratio of Rrs in spectral bands centered on 488 and 555 nm: a log-linear model for low [chl-a] (AFLC and a support vector machine (SVM model or a classic model (OC3 for high [chl-a]. The log-linear model is developed based on SVM regression analysis. This approach outperforms the classical OC3 approach, especially in shallow waters, with a root mean squared error 30% lower. The proposed algorithm enables more accurate assessments of [chl-a] and its variability in this typical oligo- to meso-trophic tropical lagoon, from shallow coastal waters and nearby reefs to deeper waters and in the open ocean.

  9. NETWORKS OF NANOPARTICLES IN ORGANIC – INORGANIC COMPOSITES: ALGORITHMIC EXTRACTION AND STATISTICAL ANALYSIS

    Directory of Open Access Journals (Sweden)

    Ralf Thiedmann

    2012-03-01

    Full Text Available The rising global demand in energy and the limited resources in fossil fuels require new technologies in renewable energies like solar cells. Silicon solar cells offer a good efficiency but suffer from high production costs. A promising alternative are polymer solar cells, due to potentially low production costs and high flexibility of the panels. In this paper, the nanostructure of organic–inorganic composites is investigated, which can be used as photoactive layers in hybrid–polymer solar cells. These materials consist of a polymeric (OC1C10-PPV phase with CdSe nanoparticles embedded therein. On the basis of 3D image data with high spatial resolution, gained by electron tomography, an algorithm is developed to automatically extract the CdSe nanoparticles from grayscale images, where we assume them as spheres. The algorithm is based on a modified version of the Hough transform, where a watershed algorithm is used to separate the image data into basins such that each basin contains exactly one nanoparticle. After their extraction, neighboring nanoparticles are connected to form a 3D network that is related to the transport of electrons in polymer solar cells. A detailed statistical analysis of the CdSe network morphology is accomplished, which allows deeper insight into the hopping percolation pathways of electrons.

  10. Understand your Algorithm: Drill Down to Sample Visualizations in Jupyter Notebooks

    Science.gov (United States)

    Mapes, B. E.; Ho, Y.; Cheedela, S. K.; McWhirter, J.

    2017-12-01

    Statistics are the currency of climate dynamics, but the space of all possible algorithms is fathomless - especially for 4-dimensional weather-resolving data that many "impact" variables depend on. Algorithms are designed on data samples, but how do you know if they measure what you expect when turned loose on Big Data? We will introduce the year-1 prototype of a 3-year scientist-led, NSF-supported, Unidata-quality software stack called DRILSDOWN (https://brianmapes.github.io/EarthCube-DRILSDOWN/) for automatically extracting, integrating, and visualizing multivariate 4D data samples. Based on a customizable "IDV bundle" of data sources, fields and displays supplied by the user, the system will teleport its space-time coordinates to fetch Cases of Interest (edge cases, typical cases, etc.) from large aggregated repositories. These standard displays can serve as backdrops to overlay with your value-added fields (such as derived quantities stored on a user's local disk). Fields can be readily pulled out of the visualization object for further processing in Python. The hope is that algorithms successfully tested in this visualization space will then be lifted out and added to automatic processing toolchains, lending confidence in the next round of processing, to seek the next Cases of Interest, in light of a user's statistical measures of "Interest". To log the scientific work done in this vein, the visualizations are wrapped in iPython-based Jupyter notebooks for rich, human-readable documentation (indeed, quasi-publication with formatted text, LaTex math, etc.). Such notebooks are readable and executable, with digital replicability and provenance built in. The entire digital object of a case study can be stored in a repository, where libraries of these Case Study Notebooks can be examined in a browser. Model data (the session topic) are of course especially convenient for this system, but observations of all sorts can also be brought in, overlain, and differenced or

  11. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems. © 2013 Elsevier Inc.

  12. SeaWiFS technical report series. Volume 4: An analysis of GAC sampling algorithms. A case study

    Science.gov (United States)

    Yeh, Eueng-Nan (Editor); Hooker, Stanford B. (Editor); Hooker, Stanford B. (Editor); Mccain, Charles R. (Editor); Fu, Gary (Editor)

    1992-01-01

    The Sea-viewing Wide Field-of-view Sensor (SeaWiFS) instrument will sample at approximately a 1 km resolution at nadir which will be broadcast for reception by realtime ground stations. However, the global data set will be comprised of coarser four kilometer data which will be recorded and broadcast to the SeaWiFS Project for processing. Several algorithms for degrading the one kilometer data to four kilometer data are examined using imagery from the Coastal Zone Color Scanner (CZCS) in an effort to determine which algorithm would best preserve the statistical characteristics of the derived products generated from the one kilometer data. Of the algorithms tested, subsampling based on a fixed pixel within a 4 x 4 pixel array is judged to yield the most consistent results when compared to the one kilometer data products.

  13. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, P; Kleibergen, F

    2003-01-01

    We consider the K-statistic, Kleibergen's (2002, Econometrica 70, 1781-1803) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Whereas Kleibergen (2002) especially analyzes the asymptotic behavior of the statistic, we focus on finite-sample properties in, a

  14. Statistical methods applied to gamma-ray spectroscopy algorithms in nuclear security missions.

    Science.gov (United States)

    Fagan, Deborah K; Robinson, Sean M; Runkle, Robert C

    2012-10-01

    Gamma-ray spectroscopy is a critical research and development priority to a range of nuclear security missions, specifically the interdiction of special nuclear material involving the detection and identification of gamma-ray sources. We categorize existing methods by the statistical methods on which they rely and identify methods that have yet to be considered. Current methods estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty, which may be significantly more complex. Thus, significantly improving algorithm performance may require greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods, could reduce decision uncertainty by rigorously and comprehensively incorporating all sources of uncertainty. Application of such methods should further meet the needs of nuclear security missions by improving upon the existing numerical infrastructure for which these analyses have not been conducted. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

    OpenAIRE

    Ali, Hesham Arafat; Attiya, Salah; El-henawy, Ibrahim

    2017-01-01

    In this paper we develop parallel cluster sampling algorithms and show that a multi-chain version is embarrassingly parallel and can be used efficiently for medical image retrieval among other applications.

  16. Nested sampling algorithm for subsurface flow model selection, uncertainty quantification, and nonlinear calibration

    KAUST Repository

    Elsheikh, A. H.; Wheeler, M. F.; Hoteit, Ibrahim

    2013-01-01

    Calibration of subsurface flow models is an essential step for managing ground water aquifers, designing of contaminant remediation plans, and maximizing recovery from hydrocarbon reservoirs. We investigate an efficient sampling algorithm known

  17. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    Science.gov (United States)

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  18. Criteria and algorithms for constructing reliable databases for statistical analysis of disruptions at ASDEX Upgrade

    International Nuclear Information System (INIS)

    Cannas, B.; Fanni, A.; Pautasso, G.; Sias, G.; Sonato, P.

    2009-01-01

    The present understanding of disruption physics has not gone so far as to provide a mathematical model describing the onset of this instability. A disruption prediction system, based on a statistical analysis of the diagnostic signals recorded during the experiments, would allow estimating the probability of a disruption to take place. A crucial point for a good design of such a prediction system is the appropriateness of the data set. This paper reports the details of the database built to train a disruption predictor based on neural networks for ASDEX Upgrade. The criteria of pulses selection, the analyses performed on plasma parameters and the implemented pre-processing algorithms, are described. As an example of application, a short description of the disruption predictor is reported.

  19. Final Report: Sampling-Based Algorithms for Estimating Structure in Big Data.

    Energy Technology Data Exchange (ETDEWEB)

    Matulef, Kevin Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-02-01

    The purpose of this project was to develop sampling-based algorithms to discover hidden struc- ture in massive data sets. Inferring structure in large data sets is an increasingly common task in many critical national security applications. These data sets come from myriad sources, such as network traffic, sensor data, and data generated by large-scale simulations. They are often so large that traditional data mining techniques are time consuming or even infeasible. To address this problem, we focus on a class of algorithms that do not compute an exact answer, but instead use sampling to compute an approximate answer using fewer resources. The particular class of algorithms that we focus on are streaming algorithms , so called because they are designed to handle high-throughput streams of data. Streaming algorithms have only a small amount of working storage - much less than the size of the full data stream - so they must necessarily use sampling to approximate the correct answer. We present two results: * A streaming algorithm called HyperHeadTail , that estimates the degree distribution of a graph (i.e., the distribution of the number of connections for each node in a network). The degree distribution is a fundamental graph property, but prior work on estimating the degree distribution in a streaming setting was impractical for many real-world application. We improve upon prior work by developing an algorithm that can handle streams with repeated edges, and graph structures that evolve over time. * An algorithm for the task of maintaining a weighted subsample of items in a stream, when the items must be sampled according to their weight, and the weights are dynamically changing. To our knowledge, this is the first such algorithm designed for dynamically evolving weights. We expect it may be useful as a building block for other streaming algorithms on dynamic data sets.

  20. Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions

    Science.gov (United States)

    Chen, Nan; Majda, Andrew J.

    2018-02-01

    Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6

  1. Inverse problems with Poisson data: statistical regularization theory, applications and algorithms

    International Nuclear Information System (INIS)

    Hohage, Thorsten; Werner, Frank

    2016-01-01

    Inverse problems with Poisson data arise in many photonic imaging modalities in medicine, engineering and astronomy. The design of regularization methods and estimators for such problems has been studied intensively over the last two decades. In this review we give an overview of statistical regularization theory for such problems, the most important applications, and the most widely used algorithms. The focus is on variational regularization methods in the form of penalized maximum likelihood estimators, which can be analyzed in a general setup. Complementing a number of recent convergence rate results we will establish consistency results. Moreover, we discuss estimators based on a wavelet-vaguelette decomposition of the (necessarily linear) forward operator. As most prominent applications we briefly introduce Positron emission tomography, inverse problems in fluorescence microscopy, and phase retrieval problems. The computation of a penalized maximum likelihood estimator involves the solution of a (typically convex) minimization problem. We also review several efficient algorithms which have been proposed for such problems over the last five years. (topical review)

  2. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  3. Sensitivity of Marine Warm Cloud Retrieval Statistics to Algorithm Choices: Examples from MODIS Collection 6

    Science.gov (United States)

    Platnick, Steven; Wind, Galina; Zhang, Zhibo; Ackerman, Steven A.; Maddux, Brent

    2012-01-01

    The optical and microphysical structure of warm boundary layer marine clouds is of fundamental importance for understanding a variety of cloud radiation and precipitation processes. With the advent of MODIS (Moderate Resolution Imaging Spectroradiometer) on the NASA EOS Terra and Aqua platforms, simultaneous global/daily 1km retrievals of cloud optical thickness and effective particle size are provided, as well as the derived water path. In addition, the cloud product (MOD06/MYD06 for MODIS Terra and Aqua, respectively) provides separate effective radii results using the l.6, 2.1, and 3.7 m spectral channels. Cloud retrieval statistics are highly sensitive to how a pixel identified as being "notclear" by a cloud mask (e.g., the MOD35/MYD35 product) is determined to be useful for an optical retrieval based on a 1-D cloud model. The Collection 5 MODIS retrieval algorithm removed pixels associated with cloud'edges as well as ocean pixels with partly cloudy elements in the 250m MODIS cloud mask - part of the so-called Clear Sky Restoral (CSR) algorithm. Collection 6 attempts retrievals for those two pixel populations, but allows a user to isolate or filter out the populations via CSR pixel-level Quality Assessment (QA) assignments. In this paper, using the preliminary Collection 6 MOD06 product, we present global and regional statistical results of marine warm cloud retrieval sensitivities to the cloud edge and 250m partly cloudy pixel populations. As expected, retrievals for these pixels are generally consistent with a breakdown of the ID cloud model. While optical thickness for these suspect pixel populations may have some utility for radiative studies, the retrievals should be used with extreme caution for process and microphysical studies.

  4. A novel statistical algorithm for gene expression analysis helps differentiate pregnane X receptor-dependent and independent mechanisms of toxicity.

    Directory of Open Access Journals (Sweden)

    M Ann Mongan

    Full Text Available Genome-wide gene expression profiling has become standard for assessing potential liabilities as well as for elucidating mechanisms of toxicity of drug candidates under development. Analysis of microarray data is often challenging due to the lack of a statistical model that is amenable to biological variation in a small number of samples. Here we present a novel non-parametric algorithm that requires minimal assumptions about the data distribution. Our method for determining differential expression consists of two steps: 1 We apply a nominal threshold on fold change and platform p-value to designate whether a gene is differentially expressed in each treated and control sample relative to the averaged control pool, and 2 We compared the number of samples satisfying criteria in step 1 between the treated and control groups to estimate the statistical significance based on a null distribution established by sample permutations. The method captures group effect without being too sensitive to anomalies as it allows tolerance for potential non-responders in the treatment group and outliers in the control group. Performance and results of this method were compared with the Significant Analysis of Microarrays (SAM method. These two methods were applied to investigate hepatic transcriptional responses of wild-type (PXR(+/+ and pregnane X receptor-knockout (PXR(-/- mice after 96 h exposure to CMP013, an inhibitor of β-secretase (β-site of amyloid precursor protein cleaving enzyme 1 or BACE1. Our results showed that CMP013 led to transcriptional changes in hallmark PXR-regulated genes and induced a cascade of gene expression changes that explained the hepatomegaly observed only in PXR(+/+ animals. Comparison of concordant expression changes between PXR(+/+ and PXR(-/- mice also suggested a PXR-independent association between CMP013 and perturbations to cellular stress, lipid metabolism, and biliary transport.

  5. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    Science.gov (United States)

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  6. Compressing an Ensemble with Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano

    2015-04-02

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific data sets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set of one billion data points with a covariance matrix comprising of 10^18 entries.

  7. Compressing an Ensemble with Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano; Genton, Marc G.

    2015-01-01

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific data sets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a non-trivial model to a data set of one billion data points with a covariance matrix comprising of 10^18 entries.

  8. Accelerating statistical image reconstruction algorithms for fan-beam x-ray CT using cloud computing

    Science.gov (United States)

    Srivastava, Somesh; Rao, A. Ravishankar; Sheinin, Vadim

    2011-03-01

    Statistical image reconstruction algorithms potentially offer many advantages to x-ray computed tomography (CT), e.g. lower radiation dose. But, their adoption in practical CT scanners requires extra computation power, which is traditionally provided by incorporating additional computing hardware (e.g. CPU-clusters, GPUs, FPGAs etc.) into a scanner. An alternative solution is to access the required computation power over the internet from a cloud computing service, which is orders-of-magnitude more cost-effective. This is because users only pay a small pay-as-you-go fee for the computation resources used (i.e. CPU time, storage etc.), and completely avoid purchase, maintenance and upgrade costs. In this paper, we investigate the benefits and shortcomings of using cloud computing for statistical image reconstruction. We parallelized the most time-consuming parts of our application, the forward and back projectors, using MapReduce, the standard parallelization library on clouds. From preliminary investigations, we found that a large speedup is possible at a very low cost. But, communication overheads inside MapReduce can limit the maximum speedup, and a better MapReduce implementation might become necessary in the future. All the experiments for this paper, including development and testing, were completed on the Amazon Elastic Compute Cloud (EC2) for less than $20.

  9. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    Science.gov (United States)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems

  10. Speeding Up Non-Parametric Bootstrap Computations for Statistics Based on Sample Moments in Small/Moderate Sample Size Applications.

    Directory of Open Access Journals (Sweden)

    Elias Chaibub Neto

    Full Text Available In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson's sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling.

  11. Performance Comparison of Reconstruction Algorithms in Discrete Blind Multi-Coset Sampling

    DEFF Research Database (Denmark)

    Grigoryan, Ruben; Arildsen, Thomas; Tandur, Deepaknath

    2012-01-01

    This paper investigates the performance of different reconstruction algorithms in discrete blind multi-coset sampling. Multi-coset scheme is a promising compressed sensing architecture that can replace traditional Nyquist-rate sampling in the applications with multi-band frequency sparse signals...

  12. Algorithms

    Indian Academy of Sciences (India)

    polynomial) division have been found in Vedic Mathematics which are dated much before Euclid's algorithm. A programming language Is used to describe an algorithm for execution on a computer. An algorithm expressed using a programming.

  13. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size

    Directory of Open Access Journals (Sweden)

    R. Eric Heidel

    2016-01-01

    Full Text Available Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  14. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    Science.gov (United States)

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  15. Effect of model choice and sample size on statistical tolerance limits

    International Nuclear Information System (INIS)

    Duran, B.S.; Campbell, K.

    1980-03-01

    Statistical tolerance limits are estimates of large (or small) quantiles of a distribution, quantities which are very sensitive to the shape of the tail of the distribution. The exact nature of this tail behavior cannot be ascertained brom small samples, so statistical tolerance limits are frequently computed using a statistical model chosen on the basis of theoretical considerations or prior experience with similar populations. This report illustrates the effects of such choices on the computations

  16. Improving Statistics Education through Simulations: The Case of the Sampling Distribution.

    Science.gov (United States)

    Earley, Mark A.

    This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…

  17. An Energy Efficient Adaptive Sampling Algorithm in a Sensor Network for Automated Water Quality Monitoring.

    Science.gov (United States)

    Shu, Tongxin; Xia, Min; Chen, Jiahong; Silva, Clarence de

    2017-11-05

    Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA) is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO) and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA), while achieving around the same Normalized Mean Error (NME), DDASA is superior in saving 5.31% more battery energy.

  18. An intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces.

    Science.gov (United States)

    Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

    2013-09-01

    Poisson disk sampling has excellent spatial and spectral properties, and plays an important role in a variety of visual computing. Although many promising algorithms have been proposed for multidimensional sampling in euclidean space, very few studies have been reported with regard to the problem of generating Poisson disks on surfaces due to the complicated nature of the surface. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. In sharp contrast to the conventional parallel approaches, our method neither partitions the given surface into small patches nor uses any spatial data structure to maintain the voids in the sampling domain. Instead, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. Our algorithm guarantees that the generated Poisson disks are uniformly and randomly distributed without bias. It is worth noting that our method is intrinsic and independent of the embedding space. This intrinsic feature allows us to generate Poisson disk patterns on arbitrary surfaces in IR(n). To our knowledge, this is the first intrinsic, parallel, and accurate algorithm for surface Poisson disk sampling. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.

  19. An Energy Efficient Adaptive Sampling Algorithm in a Sensor Network for Automated Water Quality Monitoring

    Directory of Open Access Journals (Sweden)

    Tongxin Shu

    2017-11-01

    Full Text Available Power management is crucial in the monitoring of a remote environment, especially when long-term monitoring is needed. Renewable energy sources such as solar and wind may be harvested to sustain a monitoring system. However, without proper power management, equipment within the monitoring system may become nonfunctional and, as a consequence, the data or events captured during the monitoring process will become inaccurate as well. This paper develops and applies a novel adaptive sampling algorithm for power management in the automated monitoring of the quality of water in an extensive and remote aquatic environment. Based on the data collected on line using sensor nodes, a data-driven adaptive sampling algorithm (DDASA is developed for improving the power efficiency while ensuring the accuracy of sampled data. The developed algorithm is evaluated using two distinct key parameters, which are dissolved oxygen (DO and turbidity. It is found that by dynamically changing the sampling frequency, the battery lifetime can be effectively prolonged while maintaining a required level of sampling accuracy. According to the simulation results, compared to a fixed sampling rate, approximately 30.66% of the battery energy can be saved for three months of continuous water quality monitoring. Using the same dataset to compare with a traditional adaptive sampling algorithm (ASA, while achieving around the same Normalized Mean Error (NME, DDASA is superior in saving 5.31% more battery energy.

  20. Hybrid algorithm of ensemble transform and importance sampling for assimilation of non-Gaussian observations

    Directory of Open Access Journals (Sweden)

    Shin'ya Nakano

    2014-05-01

    Full Text Available A hybrid algorithm that combines the ensemble transform Kalman filter (ETKF and the importance sampling approach is proposed. Since the ETKF assumes a linear Gaussian observation model, the estimate obtained by the ETKF can be biased in cases with nonlinear or non-Gaussian observations. The particle filter (PF is based on the importance sampling technique, and is applicable to problems with nonlinear or non-Gaussian observations. However, the PF usually requires an unrealistically large sample size in order to achieve a good estimation, and thus it is computationally prohibitive. In the proposed hybrid algorithm, we obtain a proposal distribution similar to the posterior distribution by using the ETKF. A large number of samples are then drawn from the proposal distribution, and these samples are weighted to approximate the posterior distribution according to the importance sampling principle. Since the importance sampling provides an estimate of the probability density function (PDF without assuming linearity or Gaussianity, we can resolve the bias due to the nonlinear or non-Gaussian observations. Finally, in the next forecast step, we reduce the sample size to achieve computational efficiency based on the Gaussian assumption, while we use a relatively large number of samples in the importance sampling in order to consider the non-Gaussian features of the posterior PDF. The use of the ETKF is also beneficial in terms of the computational simplicity of generating a number of random samples from the proposal distribution and in weighting each of the samples. The proposed algorithm is not necessarily effective in case that the ensemble is located distant from the true state. However, monitoring the effective sample size and tuning the factor for covariance inflation could resolve this problem. In this paper, the proposed hybrid algorithm is introduced and its performance is evaluated through experiments with non-Gaussian observations.

  1. A hybrid reliability algorithm using PSO-optimized Kriging model and adaptive importance sampling

    Science.gov (United States)

    Tong, Cao; Gong, Haili

    2018-03-01

    This paper aims to reduce the computational cost of reliability analysis. A new hybrid algorithm is proposed based on PSO-optimized Kriging model and adaptive importance sampling method. Firstly, the particle swarm optimization algorithm (PSO) is used to optimize the parameters of Kriging model. A typical function is fitted to validate improvement by comparing results of PSO-optimized Kriging model with those of the original Kriging model. Secondly, a hybrid algorithm for reliability analysis combined optimized Kriging model and adaptive importance sampling is proposed. Two cases from literatures are given to validate the efficiency and correctness. The proposed method is proved to be more efficient due to its application of small number of sample points according to comparison results.

  2. Use of a Radon Stripping Algorithm for Retrospective Assessment of Air Filter Samples

    International Nuclear Information System (INIS)

    Hayes, Robert

    2009-01-01

    An evaluation of a large number of air sample filters was undertaken using a commercial alpha and beta spectroscopy system employing a passive implanted planar silicon (PIPS) detector. Samples were only measured after air flow through the filters had ceased. Use of a commercial radon stripping algorithm was implemented to discriminate anthropogenic alpha and beta activity on the filters from the radon progeny. When uncontaminated air filters were evaluated, the results showed that there was a time-dependent bias in both average estimates and measurement dispersion with the relative bias being small compared to the dispersion. By also measuring environmental air sample filters simultaneously with electroplated alpha and beta sources, use of the radon stripping algorithm demonstrated a number of substantial unexpected deviations. Use of the current algorithm is therefore not recommended for assay applications and so use of the PIPS detector should only be utilized for gross counting without appropriate modifications to the curve fitting algorithm. As a screening method, the radon stripping algorithm might be expected to see elevated alpha and beta activities on air sample filters (not due to radon progeny) around the 200 dpm level

  3. Influence of adaptive statistical iterative reconstruction algorithm on image quality in coronary computed tomography angiography.

    Science.gov (United States)

    Precht, Helle; Thygesen, Jesper; Gerke, Oke; Egstrup, Kenneth; Waaler, Dag; Lambrechtsen, Jess

    2016-12-01

    Coronary computed tomography angiography (CCTA) requires high spatial and temporal resolution, increased low contrast resolution for the assessment of coronary artery stenosis, plaque detection, and/or non-coronary pathology. Therefore, new reconstruction algorithms, particularly iterative reconstruction (IR) techniques, have been developed in an attempt to improve image quality with no cost in radiation exposure. To evaluate whether adaptive statistical iterative reconstruction (ASIR) enhances perceived image quality in CCTA compared to filtered back projection (FBP). Thirty patients underwent CCTA due to suspected coronary artery disease. Images were reconstructed using FBP, 30% ASIR, and 60% ASIR. Ninety image sets were evaluated by five observers using the subjective visual grading analysis (VGA) and assessed by proportional odds modeling. Objective quality assessment (contrast, noise, and the contrast-to-noise ratio [CNR]) was analyzed with linear mixed effects modeling on log-transformed data. The need for ethical approval was waived by the local ethics committee as the study only involved anonymously collected clinical data. VGA showed significant improvements in sharpness by comparing FBP with ASIR, resulting in odds ratios of 1.54 for 30% ASIR and 1.89 for 60% ASIR ( P  = 0.004). The objective measures showed significant differences between FBP and 60% ASIR ( P  < 0.0001) for noise, with an estimated ratio of 0.82, and for CNR, with an estimated ratio of 1.26. ASIR improved the subjective image quality of parameter sharpness and, objectively, reduced noise and increased CNR.

  4. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    Science.gov (United States)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  5. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian

    2016-02-20

    © 2016 Taylor & Francis Group, LLC. ABSTRACT: In this work, we present an extension of the forward–reverse representation introduced by Bayer and Schoenmakers (Annals of Applied Probability, 24(5):1994–2032, 2014) to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, that is, SRNs conditional on their values in the extremes of given time intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the expectation-maximization algorithm to the phase I output. By selecting a set of overdispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  6. An efficient forward-reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Vilanova, Pedro

    2016-01-07

    In this work, we present an extension of the forward-reverse representation introduced in Simulation of forward-reverse stochastic representations for conditional diffusions , a 2014 paper by Bayer and Schoenmakers to the context of stochastic reaction networks (SRNs). We apply this stochastic representation to the computation of efficient approximations of expected values of functionals of SRN bridges, i.e., SRNs conditional on their values in the extremes of given time-intervals. We then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve a set of deterministic optimization problems where the SRNs are replaced by their reaction-rate ordinary differential equations approximation; then, during phase II, we apply the Monte Carlo version of the Expectation-Maximization algorithm to the phase I output. By selecting a set of over-dispersed seeds as initial points in phase I, the output of parallel runs from our two-phase method is a cluster of approximate maximum likelihood estimates. Our results are supported by numerical examples.

  7. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    Science.gov (United States)

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  8. STATISTICAL LANDMARKS AND PRACTICAL ISSUES REGARDING THE USE OF SIMPLE RANDOM SAMPLING IN MARKET RESEARCHES

    Directory of Open Access Journals (Sweden)

    CODRUŢA DURA

    2010-01-01

    Full Text Available The sample represents a particular segment of the statistical populationchosen to represent it as a whole. The representativeness of the sample determines the accuracyfor estimations made on the basis of calculating the research indicators and the inferentialstatistics. The method of random sampling is part of probabilistic methods which can be usedwithin marketing research and it is characterized by the fact that it imposes the requirementthat each unit belonging to the statistical population should have an equal chance of beingselected for the sampling process. When the simple random sampling is meant to be rigorouslyput into practice, it is recommended to use the technique of random number tables in order toconfigure the sample which will provide information that the marketer needs. The paper alsodetails the practical procedure implemented in order to create a sample for a marketingresearch by generating random numbers using the facilities offered by Microsoft Excel.

  9. Plane-Based Sampling for Ray Casting Algorithm in Sequential Medical Images

    Science.gov (United States)

    Lin, Lili; Chen, Shengyong; Shao, Yan; Gu, Zichun

    2013-01-01

    This paper proposes a plane-based sampling method to improve the traditional Ray Casting Algorithm (RCA) for the fast reconstruction of a three-dimensional biomedical model from sequential images. In the novel method, the optical properties of all sampling points depend on the intersection points when a ray travels through an equidistant parallel plan cluster of the volume dataset. The results show that the method improves the rendering speed at over three times compared with the conventional algorithm and the image quality is well guaranteed. PMID:23424608

  10. A novel directional asymmetric sampling search algorithm for fast block-matching motion estimation

    Science.gov (United States)

    Li, Yue-e.; Wang, Qiang

    2011-11-01

    This paper proposes a novel directional asymmetric sampling search (DASS) algorithm for video compression. Making full use of the error information (block distortions) of the search patterns, eight different direction search patterns are designed for various situations. The strategy of local sampling search is employed for the search of big-motion vector. In order to further speed up the search, early termination strategy is adopted in procedure of DASS. Compared to conventional fast algorithms, the proposed method has the most satisfactory PSNR values for all test sequences.

  11. MUSIC ALGORITHM FOR LOCATING POINT-LIKE SCATTERERS CONTAINED IN A SAMPLE ON FLAT SUBSTRATE

    Institute of Scientific and Technical Information of China (English)

    Dong Heping; Ma Fuming; Zhang Deyue

    2012-01-01

    In this paper,we consider a MUSIC algorithm for locating point-like scatterers contained in a sample on flat substrate.Based on an asymptotic expansion of the scattering amplitude proposed by Ammari et al.,the reconstruction problem can be reduced to a calculation of Green function corresponding to the background medium.In addition,we use an explicit formulation of Green function in the MUSIC algorithm to simplify the calculation when the cross-section of sample is a half-disc.Numerical experiments are included to demonstrate the feasibility of this method.

  12. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  13. Effects of (α,n) contaminants and sample multiplication on statistical neutron correlation measurements

    International Nuclear Information System (INIS)

    Dowdy, E.J.; Hansen, G.E.; Robba, A.A.; Pratt, J.C.

    1980-01-01

    The complete formalism for the use of statistical neutron fluctuation measurements for the nondestructive assay of fissionable materials has been developed. This formalism includes the effect of detector deadtime, neutron multiplicity, random neutron pulse contributions from (α,n) contaminants in the sample, and the sample multiplication of both fission-related and background neutrons

  14. Statistical evaluation of the data obtained from the K East Basin Sandfilter Backwash Pit samples

    International Nuclear Information System (INIS)

    Welsh, T.L.

    1994-01-01

    Samples were obtained from different locations from the K Each Sandfilter Backwash Pit to characterize the sludge material. These samples were analyzed chemically for elements, radionuclides, and residual compounds. The analytical results were statistically analyzed to determine the mean analyte content and the associated variability for each mean value

  15. Supporting Students to Develop Concepts Underlying Sampling and to Shuttle Between Contextual and Statistical Spheres

    NARCIS (Netherlands)

    Bakker, A.; Dierdorp, A.; Maanen, J.A. van; Eijkelhof, H.M.C.

    2012-01-01

    To stimulate students’ shuttling between contextual and statistical spheres, we based tasks on professional practices. This article focuses on two tasks to support reasoning about sampling by students aged 16-17. The purpose of the tasks was to find out which smaller sample size would have been

  16. Sampling methods to the statistical control of the production of blood components.

    Science.gov (United States)

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, W.H.M.; Meijer, R.R.; Sijtsma, K.

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier's person-.t statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  18. Comparing simulated and theoretical sampling distributions of the U3 person-fit statistic

    NARCIS (Netherlands)

    Emons, Wilco H.M.; Meijer, R.R.; Sijtsma, Klaas

    2002-01-01

    The accuracy with which the theoretical sampling distribution of van der Flier’s person-fit statistic U3 approaches the empirical U3 sampling distribution is affected by the item discrimination. A simulation study showed that for tests with a moderate or a strong mean item discrimination, the Type I

  19. Generalized Likelihood Uncertainty Estimation (GLUE) Using Multi-Optimization Algorithm as Sampling Method

    Science.gov (United States)

    Wang, Z.

    2015-12-01

    For decades, distributed and lumped hydrological models have furthered our understanding of hydrological system. The development of hydrological simulation in large scale and high precision elaborated the spatial descriptions and hydrological behaviors. Meanwhile, the new trend is also followed by the increment of model complexity and number of parameters, which brings new challenges of uncertainty quantification. Generalized Likelihood Uncertainty Estimation (GLUE) has been widely used in uncertainty analysis for hydrological models referring to Monte Carlo method coupled with Bayesian estimation. However, the stochastic sampling method of prior parameters adopted by GLUE appears inefficient, especially in high dimensional parameter space. The heuristic optimization algorithms utilizing iterative evolution show better convergence speed and optimality-searching performance. In light of the features of heuristic optimization algorithms, this study adopted genetic algorithm, differential evolution, shuffled complex evolving algorithm to search the parameter space and obtain the parameter sets of large likelihoods. Based on the multi-algorithm sampling, hydrological model uncertainty analysis is conducted by the typical GLUE framework. To demonstrate the superiority of the new method, two hydrological models of different complexity are examined. The results shows the adaptive method tends to be efficient in sampling and effective in uncertainty analysis, providing an alternative path for uncertainty quantilization.

  20. Sampling-Based Motion Planning Algorithms for Replanning and Spatial Load Balancing

    Energy Technology Data Exchange (ETDEWEB)

    Boardman, Beth Leigh [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-10-12

    The common theme of this dissertation is sampling-based motion planning with the two key contributions being in the area of replanning and spatial load balancing for robotic systems. Here, we begin by recalling two sampling-based motion planners: the asymptotically optimal rapidly-exploring random tree (RRT*), and the asymptotically optimal probabilistic roadmap (PRM*). We also provide a brief background on collision cones and the Distributed Reactive Collision Avoidance (DRCA) algorithm. The next four chapters detail novel contributions for motion replanning in environments with unexpected static obstacles, for multi-agent collision avoidance, and spatial load balancing. First, we show improved performance of the RRT* when using the proposed Grandparent-Connection (GP) or Focused-Refinement (FR) algorithms. Next, the Goal Tree algorithm for replanning with unexpected static obstacles is detailed and proven to be asymptotically optimal. A multi-agent collision avoidance problem in obstacle environments is approached via the RRT*, leading to the novel Sampling-Based Collision Avoidance (SBCA) algorithm. The SBCA algorithm is proven to guarantee collision free trajectories for all of the agents, even when subject to uncertainties in the knowledge of the other agents’ positions and velocities. Given that a solution exists, we prove that livelocks and deadlock will lead to the cost to the goal being decreased. We introduce a new deconfliction maneuver that decreases the cost-to-come at each step. This new maneuver removes the possibility of livelocks and allows a result to be formed that proves convergence to the goal configurations. Finally, we present a limited range Graph-based Spatial Load Balancing (GSLB) algorithm which fairly divides a non-convex space among multiple agents that are subject to differential constraints and have a limited travel distance. The GSLB is proven to converge to a solution when maximizing the area covered by the agents. The analysis

  1. An Overview of a Class of Clock Synchronization Algorithms for Wireless Sensor Networks: A Statistical Signal Processing Perspective

    Directory of Open Access Journals (Sweden)

    Xu Wang

    2015-08-01

    Full Text Available Recently, wireless sensor networks (WSNs have drawn great interest due to their outstanding monitoring and management potential in medical, environmental and industrial applications. Most of the applications that employ WSNs demand all of the sensor nodes to run on a common time scale, a requirement that highlights the importance of clock synchronization. The clock synchronization problem in WSNs is inherently related to parameter estimation. The accuracy of clock synchronization algorithms depends essentially on the statistical properties of the parameter estimation algorithms. Recently, studies dedicated to the estimation of synchronization parameters, such as clock offset and skew, have begun to emerge in the literature. The aim of this article is to provide an overview of the state-of-the-art clock synchronization algorithms for WSNs from a statistical signal processing point of view. This article focuses on describing the key features of the class of clock synchronization algorithms that exploit the traditional two-way message (signal exchange mechanism. Upon introducing the two-way message exchange mechanism, the main clock offset estimation algorithms for pairwise synchronization of sensor nodes are first reviewed, and their performance is compared. The class of fully-distributed clock offset estimation algorithms for network-wide synchronization is then surveyed. The paper concludes with a list of open research problems pertaining to clock synchronization of WSNs.

  2. An Automated Energy Detection Algorithm Based on Morphological and Statistical Processing Techniques

    Science.gov (United States)

    2018-01-09

    100 kHz, 1 MHz 100 MHz–1 GHz 1 100 kHz 3. Statistical Processing 3.1 Statistical Analysis Statistical analysis is the mathematical science...quantitative terms. In commercial prognostics and diagnostic vibrational monitoring applications , statistical techniques that are mainly used for alarm...Balakrishnan N, editors. Handbook of statistics . Amsterdam (Netherlands): Elsevier Science; 1998. p 555–602; (Order statistics and their applications

  3. Comparison of pure and 'Latinized' centroidal Voronoi tessellation against various other statistical sampling methods

    International Nuclear Information System (INIS)

    Romero, Vicente J.; Burkardt, John V.; Gunzburger, Max D.; Peterson, Janet S.

    2006-01-01

    A recently developed centroidal Voronoi tessellation (CVT) sampling method is investigated here to assess its suitability for use in statistical sampling applications. CVT efficiently generates a highly uniform distribution of sample points over arbitrarily shaped M-dimensional parameter spaces. On several 2-D test problems CVT has recently been found to provide exceedingly effective and efficient point distributions for response surface generation. Additionally, for statistical function integration and estimation of response statistics associated with uniformly distributed random-variable inputs (uncorrelated), CVT has been found in initial investigations to provide superior points sets when compared against latin-hypercube and simple-random Monte Carlo methods and Halton and Hammersley quasi-random sequence methods. In this paper, the performance of all these sampling methods and a new variant ('Latinized' CVT) are further compared for non-uniform input distributions. Specifically, given uncorrelated normal inputs in a 2-D test problem, statistical sampling efficiencies are compared for resolving various statistics of response: mean, variance, and exceedence probabilities

  4. Sample Data Synchronization and Harmonic Analysis Algorithm Based on Radial Basis Function Interpolation

    Directory of Open Access Journals (Sweden)

    Huaiqing Zhang

    2014-01-01

    Full Text Available The spectral leakage has a harmful effect on the accuracy of harmonic analysis for asynchronous sampling. This paper proposed a time quasi-synchronous sampling algorithm which is based on radial basis function (RBF interpolation. Firstly, a fundamental period is evaluated by a zero-crossing technique with fourth-order Newton’s interpolation, and then, the sampling sequence is reproduced by the RBF interpolation. Finally, the harmonic parameters can be calculated by FFT on the synchronization of sampling data. Simulation results showed that the proposed algorithm has high accuracy in measuring distorted and noisy signals. Compared to the local approximation schemes as linear, quadric, and fourth-order Newton interpolations, the RBF is a global approximation method which can acquire more accurate results while the time-consuming is about the same as Newton’s.

  5. REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH

    Directory of Open Access Journals (Sweden)

    R. SATISHKUMAR

    2017-01-01

    Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.

  6. Influence of adaptive statistical iterative reconstruction algorithm on image quality in coronary computed tomography angiography

    Directory of Open Access Journals (Sweden)

    Helle Precht

    2016-12-01

    Full Text Available Background Coronary computed tomography angiography (CCTA requires high spatial and temporal resolution, increased low contrast resolution for the assessment of coronary artery stenosis, plaque detection, and/or non-coronary pathology. Therefore, new reconstruction algorithms, particularly iterative reconstruction (IR techniques, have been developed in an attempt to improve image quality with no cost in radiation exposure. Purpose To evaluate whether adaptive statistical iterative reconstruction (ASIR enhances perceived image quality in CCTA compared to filtered back projection (FBP. Material and Methods Thirty patients underwent CCTA due to suspected coronary artery disease. Images were reconstructed using FBP, 30% ASIR, and 60% ASIR. Ninety image sets were evaluated by five observers using the subjective visual grading analysis (VGA and assessed by proportional odds modeling. Objective quality assessment (contrast, noise, and the contrast-to-noise ratio [CNR] was analyzed with linear mixed effects modeling on log-transformed data. The need for ethical approval was waived by the local ethics committee as the study only involved anonymously collected clinical data. Results VGA showed significant improvements in sharpness by comparing FBP with ASIR, resulting in odds ratios of 1.54 for 30% ASIR and 1.89 for 60% ASIR (P = 0.004. The objective measures showed significant differences between FBP and 60% ASIR (P < 0.0001 for noise, with an estimated ratio of 0.82, and for CNR, with an estimated ratio of 1.26. Conclusion ASIR improved the subjective image quality of parameter sharpness and, objectively, reduced noise and increased CNR.

  7. A Monte Carlo algorithm for sampling rare events: application to a search for the Griffiths singularity

    International Nuclear Information System (INIS)

    Hukushima, K; Iba, Y

    2008-01-01

    We develop a recently proposed importance-sampling Monte Carlo algorithm for sampling rare events and quenched variables in random disordered systems. We apply it to a two dimensional bond-diluted Ising model and study the Griffiths singularity which is considered to be due to the existence of rare large clusters. It is found that the distribution of the inverse susceptibility has an exponential tail down to the origin which is considered the consequence of the Griffiths singularity

  8. Algorithms

    Indian Academy of Sciences (India)

    to as 'divide-and-conquer'. Although there has been a large effort in realizing efficient algorithms, there are not many universally accepted algorithm design paradigms. In this article, we illustrate algorithm design techniques such as balancing, greedy strategy, dynamic programming strategy, and backtracking or traversal of ...

  9. Discrimination of handlebar grip samples by fourier transform infrared microspectroscopy analysis and statistics

    Directory of Open Access Journals (Sweden)

    Zeyu Lin

    2017-01-01

    Full Text Available In this paper, the authors presented a study on the discrimination of handlebar grip samples, to provide effective forensic science service for hit and run traffic cases. 50 bicycle handlebar grip samples, 49 electric bike handlebar grip samples, and 96 motorcycle handlebar grip samples have been randomly collected by the local police in Beijing (China. Fourier transform infrared microspectroscopy (FTIR was utilized as analytical technology. Then, target absorption selection, data pretreatment, and discrimination of linked samples and unlinked samples were chosen as three steps to improve the discrimination of FTIR spectrums collected from different handlebar grip samples. Principal component analysis and receiver operating characteristic curve were utilized to evaluate different data selection methods and different data pretreatment methods, respectively. It is possible to explore the evidential value of handlebar grip residue evidence through instrumental analysis and statistical treatments. It will provide a universal discrimination method for other forensic science samples as well.

  10. Angle Statistics Reconstruction: a robust reconstruction algorithm for Muon Scattering Tomography

    Science.gov (United States)

    Stapleton, M.; Burns, J.; Quillin, S.; Steer, C.

    2014-11-01

    Muon Scattering Tomography (MST) is a technique for using the scattering of cosmic ray muons to probe the contents of enclosed volumes. As a muon passes through material it undergoes multiple Coulomb scattering, where the amount of scattering is dependent on the density and atomic number of the material as well as the path length. Hence, MST has been proposed as a means of imaging dense materials, for instance to detect special nuclear material in cargo containers. Algorithms are required to generate an accurate reconstruction of the material density inside the volume from the muon scattering information and some have already been proposed, most notably the Point of Closest Approach (PoCA) and Maximum Likelihood/Expectation Maximisation (MLEM) algorithms. However, whilst PoCA-based algorithms are easy to implement, they perform rather poorly in practice. Conversely, MLEM is a complicated algorithm to implement and computationally intensive and there is currently no published, fast and easily-implementable algorithm that performs well in practice. In this paper, we first provide a detailed analysis of the source of inaccuracy in PoCA-based algorithms. We then motivate an alternative method, based on ideas first laid out by Morris et al, presenting and fully specifying an algorithm that performs well against simulations of realistic scenarios. We argue this new algorithm should be adopted by developers of Muon Scattering Tomography as an alternative to PoCA.

  11. Sampling microcanonical measures of the 2D Euler equations through Creutz’s algorithm: a phase transition from disorder to order when energy is increased

    International Nuclear Information System (INIS)

    Potters, Max; Vaillant, Timothee; Bouchet, Freddy

    2013-01-01

    The 2D Euler equations are basic examples of fluid models for which a microcanonical measure can be constructed from first principles. This measure is defined through finite-dimensional approximations and a limiting procedure. Creutz’s algorithm is a microcanonical generalization of the Metropolis–Hastings algorithm (to sample Gibbs measures, in the canonical ensemble). We prove that Creutz’s algorithm can sample finite-dimensional approximations of the 2D Euler microcanonical measures (incorporating fixed energy and other invariants). This is essential as microcanonical and canonical measures are known to be inequivalent at some values of energy and vorticity distribution. Creutz’s algorithm is used to check predictions from the mean-field statistical mechanics theory of the 2D Euler equations (the Robert–Sommeria–Miller theory). We find full agreement with theory. Three different ways to compute the temperature give consistent results. Using Creutz’s algorithm, a first-order phase transition never observed previously and a situation of statistical ensemble inequivalence are found and studied. Strikingly, and in contrast to the usual statistical mechanics interpretations, this phase transition appears from a disordered phase to an ordered phase (with fewer symmetries) when the energy is increased. We explain this paradox. (paper)

  12. Study on the Method of Association Rules Mining Based on Genetic Algorithm and Application in Analysis of Seawater Samples

    Directory of Open Access Journals (Sweden)

    Qiuhong Sun

    2014-04-01

    Full Text Available Based on the data mining research, the data mining based on genetic algorithm method, the genetic algorithm is briefly introduced, while the genetic algorithm based on two important theories and theoretical templates principle implicit parallelism is also discussed. Focuses on the application of genetic algorithms for association rule mining method based on association rule mining, this paper proposes a genetic algorithm fitness function structure, data encoding, such as the title of the improvement program, in particular through the early issues study, proposed the improved adaptive Pc, Pm algorithm is applied to the genetic algorithm, thereby improving efficiency of the algorithm. Finally, a genetic algorithm based association rule mining algorithm, and be applied in sea water samples database in data mining and prove its effective.

  13. Sampling algorithms for validation of supervised learning models for Ising-like systems

    Science.gov (United States)

    Portman, Nataliya; Tamblyn, Isaac

    2017-12-01

    In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).

  14. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    Science.gov (United States)

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Technical Note: MRI only prostate radiotherapy planning using the statistical decomposition algorithm

    International Nuclear Information System (INIS)

    Siversson, Carl; Nordström, Fredrik; Nilsson, Terese; Nyholm, Tufve; Jonsson, Joakim; Gunnlaugsson, Adalsteinn; Olsson, Lars E.

    2015-01-01

    Purpose: In order to enable a magnetic resonance imaging (MRI) only workflow in radiotherapy treatment planning, methods are required for generating Hounsfield unit (HU) maps (i.e., synthetic computed tomography, sCT) for dose calculations, directly from MRI. The Statistical Decomposition Algorithm (SDA) is a method for automatically generating sCT images from a single MR image volume, based on automatic tissue classification in combination with a model trained using a multimodal template material. This study compares dose calculations between sCT generated by the SDA and conventional CT in the male pelvic region. Methods: The study comprised ten prostate cancer patients, for whom a 3D T2 weighted MRI and a conventional planning CT were acquired. For each patient, sCT images were generated from the acquired MRI using the SDA. In order to decouple the effect of variations in patient geometry between imaging modalities from the effect of uncertainties in the SDA, the conventional CT was nonrigidly registered to the MRI to assure that their geometries were well aligned. For each patient, a volumetric modulated arc therapy plan was created for the registered CT (rCT) and recalculated for both the sCT and the conventional CT. The results were evaluated using several methods, including mean average error (MAE), a set of dose-volume histogram parameters, and a restrictive gamma criterion (2% local dose/1 mm). Results: The MAE within the body contour was 36.5 ± 4.1 (1 s.d.) HU between sCT and rCT. Average mean absorbed dose difference to target was 0.0% ± 0.2% (1 s.d.) between sCT and rCT, whereas it was −0.3% ± 0.3% (1 s.d.) between CT and rCT. The average gamma pass rate was 99.9% for sCT vs rCT, whereas it was 90.3% for CT vs rCT. Conclusions: The SDA enables a highly accurate MRI only workflow in prostate radiotherapy planning. The dosimetric uncertainties originating from the SDA appear negligible and are notably lower than the uncertainties

  16. The application of statistical and/or non-statistical sampling techniques by internal audit functions in the South African banking industry

    Directory of Open Access Journals (Sweden)

    D.P. van der Nest

    2015-03-01

    Full Text Available This article explores the use by internal audit functions of audit sampling techniques in order to test the effectiveness of controls in the banking sector. The article focuses specifically on the use of statistical and/or non-statistical sampling techniques by internal auditors. The focus of the research for this article was internal audit functions in the banking sector of South Africa. The results discussed in the article indicate that audit sampling is still used frequently as an audit evidence-gathering technique. Non-statistical sampling techniques are used more frequently than statistical sampling techniques for the evaluation of the sample. In addition, both techniques are regarded as important for the determination of the sample size and the selection of the sample items

  17. Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

    KAUST Repository

    Fidel, Adam; Jacobs, Sam Ade; Sharma, Shishir; Amato, Nancy M.; Rauchwerger, Lawrence

    2014-01-01

    Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.

  18. Using Load Balancing to Scalably Parallelize Sampling-Based Motion Planning Algorithms

    KAUST Repository

    Fidel, Adam

    2014-05-01

    Motion planning, which is the problem of computing feasible paths in an environment for a movable object, has applications in many domains ranging from robotics, to intelligent CAD, to protein folding. The best methods for solving this PSPACE-hard problem are so-called sampling-based planners. Recent work introduced uniform spatial subdivision techniques for parallelizing sampling-based motion planning algorithms that scaled well. However, such methods are prone to load imbalance, as planning time depends on region characteristics and, for most problems, the heterogeneity of the sub problems increases as the number of processors increases. In this work, we introduce two techniques to address load imbalance in the parallelization of sampling-based motion planning algorithms: an adaptive work stealing approach and bulk-synchronous redistribution. We show that applying these techniques to representatives of the two major classes of parallel sampling-based motion planning algorithms, probabilistic roadmaps and rapidly-exploring random trees, results in a more scalable and load-balanced computation on more than 3,000 cores. © 2014 IEEE.

  19. A clustering algorithm for sample data based on environmental pollution characteristics

    Science.gov (United States)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  20. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    Science.gov (United States)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  1. Statistical Sampling For In-Service Inspection Of Liquid Waste Tanks At The Savannah River Site

    International Nuclear Information System (INIS)

    Harris, S.

    2011-01-01

    Savannah River Remediation, LLC (SRR) is implementing a statistical sampling strategy for In-Service Inspection (ISI) of Liquid Waste (LW) Tanks at the United States Department of Energy's Savannah River Site (SRS) in Aiken, South Carolina. As a component of SRS's corrosion control program, the ISI program assesses tank wall structural integrity through the use of ultrasonic testing (UT). The statistical strategy for ISI is based on the random sampling of a number of vertically oriented unit areas, called strips, within each tank. The number of strips to inspect was determined so as to attain, over time, a high probability of observing at least one of the worst 5% in terms of pitting and corrosion across all tanks. The probability estimation to determine the number of strips to inspect was performed using the hypergeometric distribution. Statistical tolerance limits for pit depth and corrosion rates were calculated by fitting the lognormal distribution to the data. In addition to the strip sampling strategy, a single strip within each tank was identified to serve as the baseline for a longitudinal assessment of the tank safe operational life. The statistical sampling strategy enables the ISI program to develop individual profiles of LW tank wall structural integrity that collectively provide a high confidence in their safety and integrity over operational lifetimes.

  2. Sample Size Requirements for Assessing Statistical Moments of Simulated Crop Yield Distributions

    NARCIS (Netherlands)

    Lehmann, N.; Finger, R.; Klein, T.; Calanca, P.

    2013-01-01

    Mechanistic crop growth models are becoming increasingly important in agricultural research and are extensively used in climate change impact assessments. In such studies, statistics of crop yields are usually evaluated without the explicit consideration of sample size requirements. The purpose of

  3. Statistical surrogate model based sampling criterion for stochastic global optimization of problems with constraints

    Energy Technology Data Exchange (ETDEWEB)

    Cho, Su Gil; Jang, Jun Yong; Kim, Ji Hoon; Lee, Tae Hee [Hanyang University, Seoul (Korea, Republic of); Lee, Min Uk [Romax Technology Ltd., Seoul (Korea, Republic of); Choi, Jong Su; Hong, Sup [Korea Research Institute of Ships and Ocean Engineering, Daejeon (Korea, Republic of)

    2015-04-15

    Sequential surrogate model-based global optimization algorithms, such as super-EGO, have been developed to increase the efficiency of commonly used global optimization technique as well as to ensure the accuracy of optimization. However, earlier studies have drawbacks because there are three phases in the optimization loop and empirical parameters. We propose a united sampling criterion to simplify the algorithm and to achieve the global optimum of problems with constraints without any empirical parameters. It is able to select the points located in a feasible region with high model uncertainty as well as the points along the boundary of constraint at the lowest objective value. The mean squared error determines which criterion is more dominant among the infill sampling criterion and boundary sampling criterion. Also, the method guarantees the accuracy of the surrogate model because the sample points are not located within extremely small regions like super-EGO. The performance of the proposed method, such as the solvability of a problem, convergence properties, and efficiency, are validated through nonlinear numerical examples with disconnected feasible regions.

  4. Improved Noise Minimum Statistics Estimation Algorithm for Using in a Speech-Passing Noise-Rejecting Headset

    Directory of Open Access Journals (Sweden)

    Seyedtabaee Saeed

    2010-01-01

    Full Text Available This paper deals with configuration of an algorithm to be used in a speech-passing angle grinder noise-canceling headset. Angle grinder noise is annoying and interrupts ordinary oral communication. Meaning that, low SNR noisy condition is ahead. Since variation in angle grinder working condition changes noise statistics, the noise will be nonstationary with possible jumps in its power. Studies are conducted for picking an appropriate algorithm. A modified version of the well-known spectral subtraction shows superior performance against alternate methods. Noise estimation is calculated through a multi-band fast adapting scheme. The algorithm is adapted very quickly to the non-stationary noise environment while inflecting minimum musical noise and speech distortion on the processed signal. Objective and subjective measures illustrating the performance of the proposed method are introduced.

  5. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  6. Potential-Decomposition Strategy in Markov Chain Monte Carlo Sampling Algorithms

    International Nuclear Information System (INIS)

    Shangguan Danhua; Bao Jingdong

    2010-01-01

    We introduce the potential-decomposition strategy (PDS), which can he used in Markov chain Monte Carlo sampling algorithms. PDS can be designed to make particles move in a modified potential that favors diffusion in phase space, then, by rejecting some trial samples, the target distributions can be sampled in an unbiased manner. Furthermore, if the accepted trial samples are insufficient, they can be recycled as initial states to form more unbiased samples. This strategy can greatly improve efficiency when the original potential has multiple metastable states separated by large barriers. We apply PDS to the 2d Ising model and a double-well potential model with a large barrier, demonstrating in these two representative examples that convergence is accelerated by orders of magnitude.

  7. Theoretical Aspects of the Patterns Recognition Statistical Theory Used for Developing the Diagnosis Algorithms for Complicated Technical Systems

    Science.gov (United States)

    Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.

    2017-01-01

    In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.

  8. Robust Multi-Frame Adaptive Optics Image Restoration Algorithm Using Maximum Likelihood Estimation with Poisson Statistics

    Directory of Open Access Journals (Sweden)

    Dongming Li

    2017-04-01

    Full Text Available An adaptive optics (AO system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods.

  9. The ‘39 steps’: an algorithm for performing statistical analysis of data on energy intake and expenditure

    Directory of Open Access Journals (Sweden)

    John R. Speakman

    2013-03-01

    Full Text Available The epidemics of obesity and diabetes have aroused great interest in the analysis of energy balance, with the use of organisms ranging from nematode worms to humans. Although generating energy-intake or -expenditure data is relatively straightforward, the most appropriate way to analyse the data has been an issue of contention for many decades. In the last few years, a consensus has been reached regarding the best methods for analysing such data. To facilitate using these best-practice methods, we present here an algorithm that provides a step-by-step guide for analysing energy-intake or -expenditure data. The algorithm can be used to analyse data from either humans or experimental animals, such as small mammals or invertebrates. It can be used in combination with any commercial statistics package; however, to assist with analysis, we have included detailed instructions for performing each step for three popular statistics packages (SPSS, MINITAB and R. We also provide interpretations of the results obtained at each step. We hope that this algorithm will assist in the statistically appropriate analysis of such data, a field in which there has been much confusion and some controversy.

  10. Characterization of adaptive statistical iterative reconstruction algorithm for dose reduction in CT: A pediatric oncology perspective

    International Nuclear Information System (INIS)

    Brady, S. L.; Yee, B. S.; Kaufman, R. A.

    2012-01-01

    Purpose: This study demonstrates a means of implementing an adaptive statistical iterative reconstruction (ASiR™) technique for dose reduction in computed tomography (CT) while maintaining similar noise levels in the reconstructed image. The effects of image quality and noise texture were assessed at all implementation levels of ASiR™. Empirically derived dose reduction limits were established for ASiR™ for imaging of the trunk for a pediatric oncology population ranging from 1 yr old through adolescence/adulthood. Methods: Image quality was assessed using metrics established by the American College of Radiology (ACR) CT accreditation program. Each image quality metric was tested using the ACR CT phantom with 0%–100% ASiR™ blended with filtered back projection (FBP) reconstructed images. Additionally, the noise power spectrum (NPS) was calculated for three common reconstruction filters of the trunk. The empirically derived limitations on ASiR™ implementation for dose reduction were assessed using (1, 5, 10) yr old and adolescent/adult anthropomorphic phantoms. To assess dose reduction limits, the phantoms were scanned in increments of increased noise index (decrementing mA using automatic tube current modulation) balanced with ASiR™ reconstruction to maintain noise equivalence of the 0% ASiR™ image. Results: The ASiR™ algorithm did not produce any unfavorable effects on image quality as assessed by ACR criteria. Conversely, low-contrast resolution was found to improve due to the reduction of noise in the reconstructed images. NPS calculations demonstrated that images with lower frequency noise had lower noise variance and coarser graininess at progressively higher percentages of ASiR™ reconstruction; and in spite of the similar magnitudes of noise, the image reconstructed with 50% or more ASiR™ presented a more smoothed appearance than the pre-ASiR™ 100% FBP image. Finally, relative to non-ASiR™ images with 100% of standard dose across the

  11. Characterization of adaptive statistical iterative reconstruction algorithm for dose reduction in CT: A pediatric oncology perspective

    Energy Technology Data Exchange (ETDEWEB)

    Brady, S. L.; Yee, B. S.; Kaufman, R. A. [Department of Radiological Sciences, St. Jude Children' s Research Hospital, Memphis, Tennessee 38105 (United States)

    2012-09-15

    Purpose: This study demonstrates a means of implementing an adaptive statistical iterative reconstruction (ASiR Trade-Mark-Sign ) technique for dose reduction in computed tomography (CT) while maintaining similar noise levels in the reconstructed image. The effects of image quality and noise texture were assessed at all implementation levels of ASiR Trade-Mark-Sign . Empirically derived dose reduction limits were established for ASiR Trade-Mark-Sign for imaging of the trunk for a pediatric oncology population ranging from 1 yr old through adolescence/adulthood. Methods: Image quality was assessed using metrics established by the American College of Radiology (ACR) CT accreditation program. Each image quality metric was tested using the ACR CT phantom with 0%-100% ASiR Trade-Mark-Sign blended with filtered back projection (FBP) reconstructed images. Additionally, the noise power spectrum (NPS) was calculated for three common reconstruction filters of the trunk. The empirically derived limitations on ASiR Trade-Mark-Sign implementation for dose reduction were assessed using (1, 5, 10) yr old and adolescent/adult anthropomorphic phantoms. To assess dose reduction limits, the phantoms were scanned in increments of increased noise index (decrementing mA using automatic tube current modulation) balanced with ASiR Trade-Mark-Sign reconstruction to maintain noise equivalence of the 0% ASiR Trade-Mark-Sign image. Results: The ASiR Trade-Mark-Sign algorithm did not produce any unfavorable effects on image quality as assessed by ACR criteria. Conversely, low-contrast resolution was found to improve due to the reduction of noise in the reconstructed images. NPS calculations demonstrated that images with lower frequency noise had lower noise variance and coarser graininess at progressively higher percentages of ASiR Trade-Mark-Sign reconstruction; and in spite of the similar magnitudes of noise, the image reconstructed with 50% or more ASiR Trade-Mark-Sign presented a more

  12. Embracing equifinality with efficiency: Limits of Acceptability sampling using the DREAM(LOA) algorithm

    Science.gov (United States)

    Vrugt, Jasper A.; Beven, Keith J.

    2018-04-01

    This essay illustrates some recent developments to the DiffeRential Evolution Adaptive Metropolis (DREAM) MATLAB toolbox of Vrugt (2016) to delineate and sample the behavioural solution space of set-theoretic likelihood functions used within the GLUE (Limits of Acceptability) framework (Beven and Binley, 1992, 2014; Beven and Freer, 2001; Beven, 2006). This work builds on the DREAM(ABC) algorithm of Sadegh and Vrugt (2014) and enhances significantly the accuracy and CPU-efficiency of Bayesian inference with GLUE. In particular it is shown how lack of adequate sampling in the model space might lead to unjustified model rejection.

  13. A neural algorithm for the non-uniform and adaptive sampling of biomedical data.

    Science.gov (United States)

    Mesin, Luca

    2016-04-01

    Body sensors are finding increasing applications in the self-monitoring for health-care and in the remote surveillance of sensitive people. The physiological data to be sampled can be non-stationary, with bursts of high amplitude and frequency content providing most information. Such data could be sampled efficiently with a non-uniform schedule that increases the sampling rate only during activity bursts. A real time and adaptive algorithm is proposed to select the sampling rate, in order to reduce the number of measured samples, but still recording the main information. The algorithm is based on a neural network which predicts the subsequent samples and their uncertainties, requiring a measurement only when the risk of the prediction is larger than a selectable threshold. Four examples of application to biomedical data are discussed: electromyogram, electrocardiogram, electroencephalogram, and body acceleration. Sampling rates are reduced under the Nyquist limit, still preserving an accurate representation of the data and of their power spectral densities (PSD). For example, sampling at 60% of the Nyquist frequency, the percentage average rectified errors in estimating the signals are on the order of 10% and the PSD is fairly represented, until the highest frequencies. The method outperforms both uniform sampling and compressive sensing applied to the same data. The discussed method allows to go beyond Nyquist limit, still preserving the information content of non-stationary biomedical signals. It could find applications in body sensor networks to lower the number of wireless communications (saving sensor power) and to reduce the occupation of memory. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    Science.gov (United States)

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  15. Statistics

    International Nuclear Information System (INIS)

    2005-01-01

    For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees

  16. Algorithms

    Indian Academy of Sciences (India)

    ticians but also forms the foundation of computer science. Two ... with methods of developing algorithms for solving a variety of problems but ... applications of computers in science and engineer- ... numerical calculus are as important. We will ...

  17. Statistical characterization of a large geochemical database and effect of sample size

    Science.gov (United States)

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively

  18. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

    1997-09-18

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock

  19. Chemometric and Statistical Analyses of ToF-SIMS Spectra of Increasingly Complex Biological Samples

    Energy Technology Data Exchange (ETDEWEB)

    Berman, E S; Wu, L; Fortson, S L; Nelson, D O; Kulp, K S; Wu, K J

    2007-10-24

    Characterizing and classifying molecular variation within biological samples is critical for determining fundamental mechanisms of biological processes that will lead to new insights including improved disease understanding. Towards these ends, time-of-flight secondary ion mass spectrometry (ToF-SIMS) was used to examine increasingly complex samples of biological relevance, including monosaccharide isomers, pure proteins, complex protein mixtures, and mouse embryo tissues. The complex mass spectral data sets produced were analyzed using five common statistical and chemometric multivariate analysis techniques: principal component analysis (PCA), linear discriminant analysis (LDA), partial least squares discriminant analysis (PLSDA), soft independent modeling of class analogy (SIMCA), and decision tree analysis by recursive partitioning. PCA was found to be a valuable first step in multivariate analysis, providing insight both into the relative groupings of samples and into the molecular basis for those groupings. For the monosaccharides, pure proteins and protein mixture samples, all of LDA, PLSDA, and SIMCA were found to produce excellent classification given a sufficient number of compound variables calculated. For the mouse embryo tissues, however, SIMCA did not produce as accurate a classification. The decision tree analysis was found to be the least successful for all the data sets, providing neither as accurate a classification nor chemical insight for any of the tested samples. Based on these results we conclude that as the complexity of the sample increases, so must the sophistication of the multivariate technique used to classify the samples. PCA is a preferred first step for understanding ToF-SIMS data that can be followed by either LDA or PLSDA for effective classification analysis. This study demonstrates the strength of ToF-SIMS combined with multivariate statistical and chemometric techniques to classify increasingly complex biological samples

  20. Statistics

    International Nuclear Information System (INIS)

    2001-01-01

    For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  1. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  2. Statistics

    International Nuclear Information System (INIS)

    1999-01-01

    For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  3. Statistically Optimized Inversion Algorithm for Enhanced Retrieval of Aerosol Properties from Spectral Multi-Angle Polarimetric Satellite Observations

    Science.gov (United States)

    Dubovik, O; Herman, M.; Holdak, A.; Lapyonok, T.; Taure, D.; Deuze, J. L.; Ducos, F.; Sinyuk, A.

    2011-01-01

    The proposed development is an attempt to enhance aerosol retrieval by emphasizing statistical optimization in inversion of advanced satellite observations. This optimization concept improves retrieval accuracy relying on the knowledge of measurement error distribution. Efficient application of such optimization requires pronounced data redundancy (excess of the measurements number over number of unknowns) that is not common in satellite observations. The POLDER imager on board the PARASOL microsatellite registers spectral polarimetric characteristics of the reflected atmospheric radiation at up to 16 viewing directions over each observed pixel. The completeness of such observations is notably higher than for most currently operating passive satellite aerosol sensors. This provides an opportunity for profound utilization of statistical optimization principles in satellite data inversion. The proposed retrieval scheme is designed as statistically optimized multi-variable fitting of all available angular observations obtained by the POLDER sensor in the window spectral channels where absorption by gas is minimal. The total number of such observations by PARASOL always exceeds a hundred over each pixel and the statistical optimization concept promises to be efficient even if the algorithm retrieves several tens of aerosol parameters. Based on this idea, the proposed algorithm uses a large number of unknowns and is aimed at retrieval of extended set of parameters affecting measured radiation.

  4. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade; Manavi, Kasra; Burgos, Juan; Denny, Jory; Thomas, Shawna; Amato, Nancy M.

    2012-01-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  5. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade

    2012-05-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  6. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    Science.gov (United States)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described

  7. PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

    KAUST Repository

    Prudencio, Ernesto; Cheung, Sai Hung

    2012-01-01

    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  8. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  9. Statistics

    International Nuclear Information System (INIS)

    2003-01-01

    For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products

  10. Statistics

    International Nuclear Information System (INIS)

    2004-01-01

    For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees

  11. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  12. Data Fusion for a Vision-Radiological System: a Statistical Calibration Algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Enqvist, Andreas; Koppal, Sanjeev; Riley, Phillip [University of Florida, Gainesville, FL 32611 (United States)

    2015-07-01

    Presented here is a fusion system based on simple, low-cost computer vision and radiological sensors for tracking of multiple objects and identifying potential radiological materials being transported or shipped. The main focus of this work is the development of calibration algorithms for characterizing the fused sensor system as a single entity. There is an apparent need for correcting for a scene deviation from the basic inverse distance-squared law governing the detection rates even when evaluating system calibration algorithms. In particular, the computer vision system enables a map of distance-dependence of the sources being tracked, to which the time-dependent radiological data can be incorporated by means of data fusion of the two sensors' output data. (authors)

  13. A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation.

    Science.gov (United States)

    Layton, D M; Bundschuh, R

    2005-01-01

    Computational RNA secondary structure prediction is rather well established. However, such prediction algorithms always depend on a large number of experimentally measured parameters. Here, we study how sensitive structure prediction algorithms are to changes in these parameters. We found already that for changes corresponding to the actual experimental error to which these parameters have been determined, 30% of the structure are falsely predicted whereas the ground state structure is preserved under parameter perturbation in only 5% of all the cases. We establish that base-pairing probabilities calculated in a thermal ensemble are viable although not a perfect measure for the reliability of the prediction of individual structure elements. Here, a new measure of stability using parameter perturbation is proposed, and its limitations are discussed.

  14. Robust video watermarking via optimization algorithm for quantization of pseudo-random semi-global statistics

    Science.gov (United States)

    Kucukgoz, Mehmet; Harmanci, Oztan; Mihcak, Mehmet K.; Venkatesan, Ramarathnam

    2005-03-01

    In this paper, we propose a novel semi-blind video watermarking scheme, where we use pseudo-random robust semi-global features of video in the three dimensional wavelet transform domain. We design the watermark sequence via solving an optimization problem, such that the features of the mark-embedded video are the quantized versions of the features of the original video. The exact realizations of the algorithmic parameters are chosen pseudo-randomly via a secure pseudo-random number generator, whose seed is the secret key, that is known (resp. unknown) by the embedder and the receiver (resp. by the public). We experimentally show the robustness of our algorithm against several attacks, such as conventional signal processing modifications and adversarial estimation attacks.

  15. Data Fusion for a Vision-Radiological System: a Statistical Calibration Algorithm

    International Nuclear Information System (INIS)

    Enqvist, Andreas; Koppal, Sanjeev; Riley, Phillip

    2015-01-01

    Presented here is a fusion system based on simple, low-cost computer vision and radiological sensors for tracking of multiple objects and identifying potential radiological materials being transported or shipped. The main focus of this work is the development of calibration algorithms for characterizing the fused sensor system as a single entity. There is an apparent need for correcting for a scene deviation from the basic inverse distance-squared law governing the detection rates even when evaluating system calibration algorithms. In particular, the computer vision system enables a map of distance-dependence of the sources being tracked, to which the time-dependent radiological data can be incorporated by means of data fusion of the two sensors' output data. (authors)

  16. Classic (Nonquantic) Algorithm for Observations and Measurements Based on Statistical Strategies of Particles Fields

    OpenAIRE

    Savastru, D.; Dontu, Simona; Savastru, Roxana; Sterian, Andreea Rodica

    2013-01-01

    Our knowledge about surroundings can be achieved by observations and measurements but both are influenced by errors (noise). Therefore one of the first tasks is to try to eliminate the noise by constructing instruments with high accuracy. But any real observed and measured system is characterized by natural limits due to the deterministic nature of the measured information. The present work is dedicated to the identification of these limits. We have analyzed some algorithms for selection and ...

  17. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Science.gov (United States)

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical

  18. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Directory of Open Access Journals (Sweden)

    David C Pavlacky

    Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous

  19. An Algorithm for Detection of DVB-T Signals Based on Their Second-Order Statistics

    Directory of Open Access Journals (Sweden)

    Jallon Pierre

    2008-01-01

    Full Text Available Abstract We propose in this paper a detection algorithm based on a cost function that jointly tests the correlation induced by the cyclic prefix and the fact that this correlation is time-periodic. In the first part of the paper, the cost function is introduced and some analytical results are given. In particular, the noise and multipath channel impacts on its values are theoretically analysed. In a second part of the paper, some asymptotic results are derived. A first exploitation of these results is used to build a detection test based on the false alarm probability. These results are also used to evaluate the impact of the number of cycle frequencies taken into account in the cost function on the detection performances. Thanks to numerical estimations, we have been able to estimate that the proposed algorithm detects DVB-T signals with an SNR of  dB. As a comparison, and in the same context, the detection algorithm proposed by the 802.22 WG in 2006 is able to detect these signals with an SNR of  dB.

  20. An Algorithm for Detection of DVB-T Signals Based on Their Second-Order Statistics

    Directory of Open Access Journals (Sweden)

    Pierre Jallon

    2008-03-01

    Full Text Available We propose in this paper a detection algorithm based on a cost function that jointly tests the correlation induced by the cyclic prefix and the fact that this correlation is time-periodic. In the first part of the paper, the cost function is introduced and some analytical results are given. In particular, the noise and multipath channel impacts on its values are theoretically analysed. In a second part of the paper, some asymptotic results are derived. A first exploitation of these results is used to build a detection test based on the false alarm probability. These results are also used to evaluate the impact of the number of cycle frequencies taken into account in the cost function on the detection performances. Thanks to numerical estimations, we have been able to estimate that the proposed algorithm detects DVB-T signals with an SNR of −12 dB. As a comparison, and in the same context, the detection algorithm proposed by the 802.22 WG in 2006 is able to detect these signals with an SNR of −8 dB.

  1. Statistical analyses to support guidelines for marine avian sampling. Final report

    Science.gov (United States)

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical

  2. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    International Nuclear Information System (INIS)

    Harris, S.P.

    1997-01-01

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock-up Facility using 3 ml inserts and 15 ml peanut vials. A number of the insert samples were analyzed by Cold Chem and compared with full peanut vial samples analyzed by the current methods. The remaining inserts were analyzed by

  3. Density meter algorithm and system for estimating sampling/mixing uncertainty

    International Nuclear Information System (INIS)

    Shine, E.P.

    1986-01-01

    The Laboratories Department at the Savannah River Plant (SRP) has installed a six-place density meter with an automatic sampling device. This paper describes the statistical software developed to analyze the density of uranyl nitrate solutions using this automated system. The purpose of this software is twofold: to estimate the sampling/mixing and measurement uncertainties in the process and to provide a measurement control program for the density meter. Non-uniformities in density are analyzed both analytically and graphically. The mean density and its limit of error are estimated. Quality control standards are analyzed concurrently with process samples and used to control the density meter measurement error. The analyses are corrected for concentration due to evaporation of samples waiting to be analyzed. The results of this program have been successful in identifying sampling/mixing problems and controlling the quality of analyses

  4. Physics-Based Image Segmentation Using First Order Statistical Properties and Genetic Algorithm for Inductive Thermography Imaging.

    Science.gov (United States)

    Gao, Bin; Li, Xiaoqing; Woo, Wai Lok; Tian, Gui Yun

    2018-05-01

    Thermographic inspection has been widely applied to non-destructive testing and evaluation with the capabilities of rapid, contactless, and large surface area detection. Image segmentation is considered essential for identifying and sizing defects. To attain a high-level performance, specific physics-based models that describe defects generation and enable the precise extraction of target region are of crucial importance. In this paper, an effective genetic first-order statistical image segmentation algorithm is proposed for quantitative crack detection. The proposed method automatically extracts valuable spatial-temporal patterns from unsupervised feature extraction algorithm and avoids a range of issues associated with human intervention in laborious manual selection of specific thermal video frames for processing. An internal genetic functionality is built into the proposed algorithm to automatically control the segmentation threshold to render enhanced accuracy in sizing the cracks. Eddy current pulsed thermography will be implemented as a platform to demonstrate surface crack detection. Experimental tests and comparisons have been conducted to verify the efficacy of the proposed method. In addition, a global quantitative assessment index F-score has been adopted to objectively evaluate the performance of different segmentation algorithms.

  5. A preliminary study on identification of Thai rice samples by INAA and statistical analysis

    Science.gov (United States)

    Kongsri, S.; Kukusamude, C.

    2017-09-01

    This study aims to investigate the elemental compositions in 93 Thai rice samples using instrumental neutron activation analysis (INAA) and to identify rice according to their types and rice cultivars using statistical analysis. As, Mg, Cl, Al, Br, Mn, K, Rb and Zn in Thai jasmine rice and Sung Yod rice samples were successfully determined by INAA. The accuracy and precision of the INAA method were verified by SRM 1568a Rice Flour. All elements were found to be in a good agreement with the certified values. The precisions in term of %RSD were lower than 7%. The LODs were obtained in range of 0.01 to 29 mg kg-1. The concentration of 9 elements distributed in Thai rice samples was evaluated and used as chemical indicators to identify the type of rice samples. The result found that Mg, Cl, As, Br, Mn, K, Rb, and Zn concentrations in Thai jasmine rice samples are significantly different but there was no evidence that Al is significantly different from concentration in Sung Yod rice samples at 95% confidence interval. Our results may provide preliminary information for discrimination of rice samples and may be useful database of Thai rice.

  6. Subclinical delusional ideation and appreciation of sample size and heterogeneity in statistical judgment.

    Science.gov (United States)

    Galbraith, Niall D; Manktelow, Ken I; Morris, Neil G

    2010-11-01

    Previous studies demonstrate that people high in delusional ideation exhibit a data-gathering bias on inductive reasoning tasks. The current study set out to investigate the factors that may underpin such a bias by examining healthy individuals, classified as either high or low scorers on the Peters et al. Delusions Inventory (PDI). More specifically, whether high PDI scorers have a relatively poor appreciation of sample size and heterogeneity when making statistical judgments. In Expt 1, high PDI scorers made higher probability estimates when generalizing from a sample of 1 with regard to the heterogeneous human property of obesity. In Expt 2, this effect was replicated and was also observed in relation to the heterogeneous property of aggression. The findings suggest that delusion-prone individuals are less appreciative of the importance of sample size when making statistical judgments about heterogeneous properties; this may underpin the data gathering bias observed in previous studies. There was some support for the hypothesis that threatening material would exacerbate high PDI scorers' indifference to sample size.

  7. Finite sample performance of the E-M algorithm for ranks data modelling

    Directory of Open Access Journals (Sweden)

    Angela D'Elia

    2007-10-01

    Full Text Available We check the finite sample performance of the maximum likelihood estimators of the parameters of a mixture distribution recently introduced for modelling ranks/preference data. The estimates are derived by the E-M algorithm and the performance is evaluated both from an univariate and bivariate points of view. While the results are generally acceptable as far as it concerns the bias, the Monte Carlo experiment shows a different behaviour of the estimators efficiency for the two parameters of the mixture, mainly depending upon their location in the admissible parametric space. Some operative suggestions conclude the paer.

  8. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Application

    KAUST Repository

    Chambolle, Antonin; Ehrhardt, Matthias J.; Richtarik, Peter; Schö nlieb, Carola-Bibiane

    2017-01-01

    We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

  9. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Application

    KAUST Repository

    Chambolle, Antonin

    2017-06-15

    We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual variable. The analysis is carried out for general convex-concave saddle point problems and problems that are either partially smooth / strongly convex or fully smooth / strongly convex. We perform the analysis for arbitrary samplings of dual variables, and obtain known deterministic results as a special case. Several variants of our stochastic method significantly outperform the deterministic variant on a variety of imaging tasks.

  10. New Hybrid Monte Carlo methods for efficient sampling. From physics to biology and statistics

    International Nuclear Information System (INIS)

    Akhmatskaya, Elena; Reich, Sebastian

    2011-01-01

    We introduce a class of novel hybrid methods for detailed simulations of large complex systems in physics, biology, materials science and statistics. These generalized shadow Hybrid Monte Carlo (GSHMC) methods combine the advantages of stochastic and deterministic simulation techniques. They utilize a partial momentum update to retain some of the dynamical information, employ modified Hamiltonians to overcome exponential performance degradation with the system’s size and make use of multi-scale nature of complex systems. Variants of GSHMCs were developed for atomistic simulation, particle simulation and statistics: GSHMC (thermodynamically consistent implementation of constant-temperature molecular dynamics), MTS-GSHMC (multiple-time-stepping GSHMC), meso-GSHMC (Metropolis corrected dissipative particle dynamics (DPD) method), and a generalized shadow Hamiltonian Monte Carlo, GSHmMC (a GSHMC for statistical simulations). All of these are compatible with other enhanced sampling techniques and suitable for massively parallel computing allowing for a range of multi-level parallel strategies. A brief description of the GSHMC approach, examples of its application on high performance computers and comparison with other existing techniques are given. Our approach is shown to resolve such problems as resonance instabilities of the MTS methods and non-preservation of thermodynamic equilibrium properties in DPD, and to outperform known methods in sampling efficiency by an order of magnitude. (author)

  11. Algorithms

    Indian Academy of Sciences (India)

    algorithm design technique called 'divide-and-conquer'. One of ... Turtle graphics, September. 1996. 5. ... whole list named 'PO' is a pointer to the first element of the list; ..... Program for computing matrices X and Y and placing the result in C *).

  12. Algorithms

    Indian Academy of Sciences (India)

    algorithm that it is implicitly understood that we know how to generate the next natural ..... Explicit comparisons are made in line (1) where maximum and minimum is ... It can be shown that the function T(n) = 3/2n -2 is the solution to the above ...

  13. An efficient forward–reverse expectation-maximization algorithm for statistical inference in stochastic reaction networks

    KAUST Repository

    Bayer, Christian; Moraes, Alvaro; Tempone, Raul; Vilanova, Pedro

    2016-01-01

    then employ this SRN bridge-generation technique to the statistical inference problem of approximating reaction propensities based on discretely observed data. To this end, we introduce a two-phase iterative inference method in which, during phase I, we solve

  14. Parameter sampling capabilities of sequential and simultaneous data assimilation: II. Statistical analysis of numerical results

    International Nuclear Information System (INIS)

    Fossum, Kristian; Mannseth, Trond

    2014-01-01

    We assess and compare parameter sampling capabilities of one sequential and one simultaneous Bayesian, ensemble-based, joint state-parameter (JS) estimation method. In the companion paper, part I (Fossum and Mannseth 2014 Inverse Problems 30 114002), analytical investigations lead us to propose three claims, essentially stating that the sequential method can be expected to outperform the simultaneous method for weakly nonlinear forward models. Here, we assess the reliability and robustness of these claims through statistical analysis of results from a range of numerical experiments. Samples generated by the two approximate JS methods are compared to samples from the posterior distribution generated by a Markov chain Monte Carlo method, using four approximate measures of distance between probability distributions. Forward-model nonlinearity is assessed from a stochastic nonlinearity measure allowing for sufficiently large model dimensions. Both toy models (with low computational complexity, and where the nonlinearity is fairly easy to control) and two-phase porous-media flow models (corresponding to down-scaled versions of problems to which the JS methods have been frequently applied recently) are considered in the numerical experiments. Results from the statistical analysis show strong support of all three claims stated in part I. (paper)

  15. ALGORITHM OF PREPARATION OF THE TRAINING SAMPLE USING 3D-FACE MODELING

    Directory of Open Access Journals (Sweden)

    D. I. Samal

    2016-01-01

    Full Text Available The algorithm of preparation and sampling for training of the multiclass qualifier of support vector machines (SVM is provided. The described approach based on the modeling of possible changes of the face features of recognized person. Additional features like perspectives of shooting, conditions of lighting, tilt angles were introduced to get improved identification results. These synthetic generated changes have some impact on the classifier learning expanding the range of possible variations of the initial image. The classifier learned with such extended example is ready to recognize unknown objects better. The age, emotional looks, turns of the head, various conditions of lighting, noise, and also some combinations of the listed parameters are chosen as the key considered parameters for modeling. The third-party software ‘FaceGen’ allowing to model up to 150 parameters and available in a demoversion for free downloading is used for 3D-modeling.The SVM classifier was chosen to test the impact of the introduced modifications of training sample. The preparation and preliminary processing of images contains the following constituents like detection and localization of area of the person on the image, assessment of an angle of rotation and an inclination, extension of the range of brightness of pixels and an equalization of the histogram to smooth the brightness and contrast characteristics of the processed images, scaling of the localized and processed area of the person, creation of a vector of features of the scaled and processed image of the person by a Principal component analysis (algorithm NIPALS, training of the multiclass SVM-classifier.The provided algorithm of expansion of the training selection is oriented to be used in practice and allows to expand using 3D-models the processed range of 2D – photographs of persons that positively affects results of identification in system of face recognition. This approach allows to compensate

  16. An Energy Aware Adaptive Sampling Algorithm for Energy Harvesting WSN with Energy Hungry Sensors

    Science.gov (United States)

    Srbinovski, Bruno; Magno, Michele; Edwards-Murphy, Fiona; Pakrashi, Vikram; Popovici, Emanuel

    2016-01-01

    Wireless sensor nodes have a limited power budget, though they are often expected to be functional in the field once deployed for extended periods of time. Therefore, minimization of energy consumption and energy harvesting technology in Wireless Sensor Networks (WSN) are key tools for maximizing network lifetime, and achieving self-sustainability. This paper proposes an energy aware Adaptive Sampling Algorithm (ASA) for WSN with power hungry sensors and harvesting capabilities, an energy management technique that can be implemented on any WSN platform with enough processing power to execute the proposed algorithm. An existing state-of-the-art ASA developed for wireless sensor networks with power hungry sensors is optimized and enhanced to adapt the sampling frequency according to the available energy of the node. The proposed algorithm is evaluated using two in-field testbeds that are supplied by two different energy harvesting sources (solar and wind). Simulation and comparison between the state-of-the-art ASA and the proposed energy aware ASA (EASA) in terms of energy durability are carried out using in-field measured harvested energy (using both wind and solar sources) and power hungry sensors (ultrasonic wind sensor and gas sensors). The simulation results demonstrate that using ASA in combination with an energy aware function on the nodes can drastically increase the lifetime of a WSN node and enable self-sustainability. In fact, the proposed EASA in conjunction with energy harvesting capability can lead towards perpetual WSN operation and significantly outperform the state-of-the-art ASA. PMID:27043559

  17. An Energy Aware Adaptive Sampling Algorithm for Energy Harvesting WSN with Energy Hungry Sensors

    Directory of Open Access Journals (Sweden)

    Bruno Srbinovski

    2016-03-01

    Full Text Available Wireless sensor nodes have a limited power budget, though they are often expected to be functional in the field once deployed for extended periods of time. Therefore, minimization of energy consumption and energy harvesting technology in Wireless Sensor Networks (WSN are key tools for maximizing network lifetime, and achieving self-sustainability. This paper proposes an energy aware Adaptive Sampling Algorithm (ASA for WSN with power hungry sensors and harvesting capabilities, an energy management technique that can be implemented on any WSN platform with enough processing power to execute the proposed algorithm. An existing state-of-the-art ASA developed for wireless sensor networks with power hungry sensors is optimized and enhanced to adapt the sampling frequency according to the available energy of the node. The proposed algorithm is evaluated using two in-field testbeds that are supplied by two different energy harvesting sources (solar and wind. Simulation and comparison between the state-of-the-art ASA and the proposed energy aware ASA (EASA in terms of energy durability are carried out using in-field measured harvested energy (using both wind and solar sources and power hungry sensors (ultrasonic wind sensor and gas sensors. The simulation results demonstrate that using ASA in combination with an energy aware function on the nodes can drastically increase the lifetime of a WSN node and enable self-sustainability. In fact, the proposed EASA in conjunction with energy harvesting capability can lead towards perpetual WSN operation and significantly outperform the state-of-the-art ASA.

  18. The Statistics of Radio Astronomical Polarimetry: Disjoint, Superposed, and Composite Samples

    Energy Technology Data Exchange (ETDEWEB)

    Straten, W. van [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Hawthorn, VIC 3122 (Australia); Tiburzi, C., E-mail: willem.van.straten@aut.ac.nz [Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, D-53121 Bonn (Germany)

    2017-02-01

    A statistical framework is presented for the study of the orthogonally polarized modes of radio pulsar emission via the covariances between the Stokes parameters. To accommodate the typically heavy-tailed distributions of single-pulse radio flux density, the fourth-order joint cumulants of the electric field are used to describe the superposition of modes with arbitrary probability distributions. The framework is used to consider the distinction between superposed and disjoint modes, with particular attention to the effects of integration over finite samples. If the interval over which the polarization state is estimated is longer than the timescale for switching between two or more disjoint modes of emission, then the modes are unresolved by the instrument. The resulting composite sample mean exhibits properties that have been attributed to mode superposition, such as depolarization. Because the distinction between disjoint modes and a composite sample of unresolved disjoint modes depends on the temporal resolution of the observing instrumentation, the arguments in favor of superposed modes of pulsar emission are revisited, and observational evidence for disjoint modes is described. In principle, the four-dimensional covariance matrix that describes the distribution of sample mean Stokes parameters can be used to distinguish between disjoint modes, superposed modes, and a composite sample of unresolved disjoint modes. More comprehensive and conclusive interpretation of the covariance matrix requires more detailed consideration of various relevant phenomena, including temporally correlated subpulse modulation (e.g., jitter), statistical dependence between modes (e.g., covariant intensities and partial coherence), and multipath propagation effects (e.g., scintillation and scattering).

  19. Poisson-Box Sampling algorithms for three-dimensional Markov binary mixtures

    Science.gov (United States)

    Larmier, Coline; Zoia, Andrea; Malvagi, Fausto; Dumonteil, Eric; Mazzolo, Alain

    2018-02-01

    Particle transport in Markov mixtures can be addressed by the so-called Chord Length Sampling (CLS) methods, a family of Monte Carlo algorithms taking into account the effects of stochastic media on particle propagation by generating on-the-fly the material interfaces crossed by the random walkers during their trajectories. Such methods enable a significant reduction of computational resources as opposed to reference solutions obtained by solving the Boltzmann equation for a large number of realizations of random media. CLS solutions, which neglect correlations induced by the spatial disorder, are faster albeit approximate, and might thus show discrepancies with respect to reference solutions. In this work we propose a new family of algorithms (called 'Poisson Box Sampling', PBS) aimed at improving the accuracy of the CLS approach for transport in d-dimensional binary Markov mixtures. In order to probe the features of PBS methods, we will focus on three-dimensional Markov media and revisit the benchmark problem originally proposed by Adams, Larsen and Pomraning [1] and extended by Brantley [2]: for these configurations we will compare reference solutions, standard CLS solutions and the new PBS solutions for scalar particle flux, transmission and reflection coefficients. PBS will be shown to perform better than CLS at the expense of a reasonable increase in computational time.

  20. A Probabilistic Mass Estimation Algorithm for a Novel 7- Channel Capacitive Sample Verification Sensor

    Science.gov (United States)

    Wolf, Michael

    2012-01-01

    A document describes an algorithm created to estimate the mass placed on a sample verification sensor (SVS) designed for lunar or planetary robotic sample return missions. A novel SVS measures the capacitance between a rigid bottom plate and an elastic top membrane in seven locations. As additional sample material (soil and/or small rocks) is placed on the top membrane, the deformation of the membrane increases the capacitance. The mass estimation algorithm addresses both the calibration of each SVS channel, and also addresses how to combine the capacitances read from each of the seven channels into a single mass estimate. The probabilistic approach combines the channels according to the variance observed during the training phase, and provides not only the mass estimate, but also a value for the certainty of the estimate. SVS capacitance data is collected for known masses under a wide variety of possible loading scenarios, though in all cases, the distribution of sample within the canister is expected to be approximately uniform. A capacitance-vs-mass curve is fitted to this data, and is subsequently used to determine the mass estimate for the single channel s capacitance reading during the measurement phase. This results in seven different mass estimates, one for each SVS channel. Moreover, the variance of the calibration data is used to place a Gaussian probability distribution function (pdf) around this mass estimate. To blend these seven estimates, the seven pdfs are combined into a single Gaussian distribution function, providing the final mean and variance of the estimate. This blending technique essentially takes the final estimate as an average of the estimates of the seven channels, weighted by the inverse of the channel s variance.

  1. Study on loss detection algorithms for tank monitoring data using multivariate statistical analysis

    International Nuclear Information System (INIS)

    Suzuki, Mitsutoshi; Burr, Tom

    2009-01-01

    Evaluation of solution monitoring data to support material balance evaluation was proposed about a decade ago because of concerns regarding the large throughput planned at Rokkasho Reprocessing Plant (RRP). A numerical study using the simulation code (FACSIM) was done and significant increases in the detection probabilities (DP) for certain types of losses were shown. To be accepted internationally, it is very important to verify such claims using real solution monitoring data. However, a demonstrative study with real tank data has not been carried out due to the confidentiality of the tank data. This paper describes an experimental study that has been started using actual data from the Solution Measurement and Monitoring System (SMMS) in the Tokai Reprocessing Plant (TRP) and the Savannah River Site (SRS). Multivariate statistical methods, such as a vector cumulative sum and a multi-scale statistical analysis, have been applied to the real tank data that have superimposed simulated loss. Although quantitative conclusions have not been derived for the moment due to the difficulty of baseline evaluation, the multivariate statistical methods remain promising for abrupt and some types of protracted loss detection. (author)

  2. MULTI-LEVEL SAMPLING APPROACH FOR CONTINOUS LOSS DETECTION USING ITERATIVE WINDOW AND STATISTICAL MODEL

    OpenAIRE

    Mohd Fo'ad Rohani; Mohd Aizaini Maarof; Ali Selamat; Houssain Kettani

    2010-01-01

    This paper proposes a Multi-Level Sampling (MLS) approach for continuous Loss of Self-Similarity (LoSS) detection using iterative window. The method defines LoSS based on Second Order Self-Similarity (SOSS) statistical model. The Optimization Method (OM) is used to estimate self-similarity parameter since it is fast and more accurate in comparison with other estimation methods known in the literature. Probability of LoSS detection is introduced to measure continuous LoSS detection performance...

  3. Algorithms

    Indian Academy of Sciences (India)

    will become clear in the next article when we discuss a simple logo like programming language. ... Rod B may be used as an auxiliary store. The problem is to find an algorithm which performs this task. ... No disks are moved from A to Busing C as auxiliary rod. • move _disk (A, C);. (No + l)th disk is moved from A to C directly ...

  4. Generation of a statistical shape model with probabilistic point correspondences and the expectation maximization- iterative closest point algorithm

    International Nuclear Information System (INIS)

    Hufnagel, Heike; Pennec, Xavier; Ayache, Nicholas; Ehrhardt, Jan; Handels, Heinz

    2008-01-01

    Identification of point correspondences between shapes is required for statistical analysis of organ shapes differences. Since manual identification of landmarks is not a feasible option in 3D, several methods were developed to automatically find one-to-one correspondences on shape surfaces. For unstructured point sets, however, one-to-one correspondences do not exist but correspondence probabilities can be determined. A method was developed to compute a statistical shape model based on shapes which are represented by unstructured point sets with arbitrary point numbers. A fundamental problem when computing statistical shape models is the determination of correspondences between the points of the shape observations of the training data set. In the absence of landmarks, exact correspondences can only be determined between continuous surfaces, not between unstructured point sets. To overcome this problem, we introduce correspondence probabilities instead of exact correspondences. The correspondence probabilities are found by aligning the observation shapes with the affine expectation maximization-iterative closest points (EM-ICP) registration algorithm. In a second step, the correspondence probabilities are used as input to compute a mean shape (represented once again by an unstructured point set). Both steps are unified in a single optimization criterion which depe nds on the two parameters 'registration transformation' and 'mean shape'. In a last step, a variability model which best represents the variability in the training data set is computed. Experiments on synthetic data sets and in vivo brain structure data sets (MRI) are then designed to evaluate the performance of our algorithm. The new method was applied to brain MRI data sets, and the estimated point correspondences were compared to a statistical shape model built on exact correspondences. Based on established measures of ''generalization ability'' and ''specificity'', the estimates were very satisfactory

  5. Constrained statistical inference: sample-size tables for ANOVA and regression

    Directory of Open Access Journals (Sweden)

    Leonard eVanbrabant

    2015-01-01

    Full Text Available Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient beta1 is larger than beta2 and beta3. The corresponding hypothesis is H: beta1 > {beta2, beta3} and this is known as an (order constrained hypothesis. A major advantage of testing such a hypothesis is that power can be gained and inherently a smaller sample size is needed. This article discusses this gain in sample size reduction, when an increasing number of constraints is included into the hypothesis. The main goal is to present sample-size tables for constrained hypotheses. A sample-size table contains the necessary sample-size at a prespecified power (say, 0.80 for an increasing number of constraints. To obtain sample-size tables, two Monte Carlo simulations were performed, one for ANOVA and one for multiple regression. Three results are salient. First, in an ANOVA the needed sample-size decreases with 30% to 50% when complete ordering of the parameters is taken into account. Second, small deviations from the imposed order have only a minor impact on the power. Third, at the maximum number of constraints, the linear regression results are comparable with the ANOVA results. However, in the case of fewer constraints, ordering the parameters (e.g., beta1 > beta2 results in a higher power than assigning a positive or a negative sign to the parameters (e.g., beta1 > 0.

  6. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

    Science.gov (United States)

    Heskes, Tom; Eisinga, Rob; Breitling, Rainer

    2014-11-21

    The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

  7. Effect of the Target Motion Sampling Temperature Treatment Method on the Statistics and Performance

    Science.gov (United States)

    Viitanen, Tuomas; Leppänen, Jaakko

    2014-06-01

    Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very strong resonances. This effect is actually not related to the usage of sampled responses, but is instead an inherent property of the TMS tracking method and concerns both EBT and 0 K calculations.

  8. Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

    Directory of Open Access Journals (Sweden)

    James Robert White

    2009-04-01

    Full Text Available Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software

  9. STATISTICAL EVALUATION OF SMALL SCALE MIXING DEMONSTRATION SAMPLING AND BATCH TRANSFER PERFORMANCE - 12093

    Energy Technology Data Exchange (ETDEWEB)

    GREER DA; THIEN MG

    2012-01-12

    The ability to effectively mix, sample, certify, and deliver consistent batches of High Level Waste (HLW) feed from the Hanford Double Shell Tanks (DST) to the Waste Treatment and Immobilization Plant (WTP) presents a significant mission risk with potential to impact mission length and the quantity of HLW glass produced. DOE's Tank Operations Contractor, Washington River Protection Solutions (WRPS) has previously presented the results of mixing performance in two different sizes of small scale DSTs to support scale up estimates of full scale DST mixing performance. Currently, sufficient sampling of DSTs is one of the largest programmatic risks that could prevent timely delivery of high level waste to the WTP. WRPS has performed small scale mixing and sampling demonstrations to study the ability to sufficiently sample the tanks. The statistical evaluation of the demonstration results which lead to the conclusion that the two scales of small DST are behaving similarly and that full scale performance is predictable will be presented. This work is essential to reduce the risk of requiring a new dedicated feed sampling facility and will guide future optimization work to ensure the waste feed delivery mission will be accomplished successfully. This paper will focus on the analytical data collected from mixing, sampling, and batch transfer testing from the small scale mixing demonstration tanks and how those data are being interpreted to begin to understand the relationship between samples taken prior to transfer and samples from the subsequent batches transferred. An overview of the types of data collected and examples of typical raw data will be provided. The paper will then discuss the processing and manipulation of the data which is necessary to begin evaluating sampling and batch transfer performance. This discussion will also include the evaluation of the analytical measurement capability with regard to the simulant material used in the demonstration tests. The

  10. Statistical assessment of fish behavior from split-beam hydro-acoustic sampling

    International Nuclear Information System (INIS)

    McKinstry, Craig A.; Simmons, Mary Ann; Simmons, Carver S.; Johnson, Robert L.

    2005-01-01

    Statistical methods are presented for using echo-traces from split-beam hydro-acoustic sampling to assess fish behavior in response to a stimulus. The data presented are from a study designed to assess the response of free-ranging, lake-resident fish, primarily kokanee (Oncorhynchus nerka) and rainbow trout (Oncorhynchus mykiss) to high intensity strobe lights, and was conducted at Grand Coulee Dam on the Columbia River in Northern Washington State. The lights were deployed immediately upstream from the turbine intakes, in a region exposed to daily alternating periods of high and low flows. The study design included five down-looking split-beam transducers positioned in a line at incremental distances upstream from the strobe lights, and treatments applied in randomized pseudo-replicate blocks. Statistical methods included the use of odds-ratios from fitted loglinear models. Fish-track velocity vectors were modeled using circular probability distributions. Both analyses are depicted graphically. Study results suggest large increases of fish activity in the presence of the strobe lights, most notably at night and during periods of low flow. The lights also induced notable bimodality in the angular distributions of the fish track velocity vectors. Statistical/SUMmaries are presented along with interpretations on fish behavior

  11. A new fast algorithm for the evaluation of regions of interest and statistical uncertainty in computed tomography

    International Nuclear Information System (INIS)

    Huesman, R.H.

    1984-01-01

    A new algorithm for region of interest evaluation in computed tomography is described. Region of interest evaluation is a technique used to improve quantitation of the tomographic imaging process by summing (or averaging) the reconstructed quantity throughout a volume of particular significance. An important application of this procedure arises in the analysis of dynamic emission computed tomographic data, in which the uptake and clearance of radiotracers are used to determine the blood flow and/or physiologica function of tissue within the significant volume. The new algorithm replaces the conventional technique of repeated image reconstructions with one in which projected regions are convolved and then used to form multiple vector inner products with the raw tomographic data sets. Quantitation of regions of interest is made without the need for reconstruction of tomographic images. The computational advantage of the new algorithm over conventional methods is between factors of 20 and of 500 for typical applications encountered in medical science studies. The greatest benefit is the ease with which the statistical uncertainty of the result is computed. The entire covariance matrix for the evaluation of regions of interest can be calculated with relatively few operations. (author)

  12. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    Science.gov (United States)

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  13. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification

    Directory of Open Access Journals (Sweden)

    D. Ramyachitra

    2015-09-01

    Full Text Available Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM, K-nearest neighbor (KNN, Interval Valued Classification (IVC and the improvised Interval Value based Particle Swarm Optimization (IVPSO algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  14. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification.

    Science.gov (United States)

    Ramyachitra, D; Sofia, M; Manikandan, P

    2015-09-01

    Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  15. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  16. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms

    Science.gov (United States)

    Elhadad, N.; Claassen, J.; Perotte, R.; Goldstein, A.; Hripcsak, G.

    2018-01-01

    We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a

  17. Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms.

    Science.gov (United States)

    Albers, D J; Elhadad, N; Claassen, J; Perotte, R; Goldstein, A; Hripcsak, G

    2018-02-01

    We study the question of how to represent or summarize raw laboratory data taken from an electronic health record (EHR) using parametric model selection to reduce or cope with biases induced through clinical care. It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al., 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al., 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary. The PopKLD algorithm can be automated and is designed to be applicable in high-throughput settings; for example, the output of the PopKLD algorithm can be used as input for phenotyping algorithms. Moreover, we develop the PopKLD-CAT algorithm that transforms the continuous PopKLD summary into a categorical summary useful for applications that require categorical data such as topic modeling. We evaluate our methodology in two ways. First, we apply the method to laboratory data collected in two different health care contexts, primary versus intensive care. We show that the PopKLD preserves known physiologic features in the data that are lost when summarizing the data using more common laboratory data summaries such as mean and standard deviation. Second, for three disease-laboratory measurement pairs, we perform a phenotyping task: we use the PopKLD and PopKLD-CAT algorithms to define high and low values of the laboratory variable that are used for defining a disease state. We then compare the relationship between the PopKLD-CAT summary disease predictions and the same predictions using empirically estimated mean and standard deviation to a

  18. Determination of Sr-90 in milk samples from the study of statistical results

    Directory of Open Access Journals (Sweden)

    Otero-Pazos Alberto

    2017-01-01

    Full Text Available The determination of 90Sr in milk samples is the main objective of radiation monitoring laboratories because of its environmental importance. In this paper the concentration of activity of 39 milk samples was obtained through radiochemical separation based on selective retention of Sr in a cationic resin (Dowex 50WX8, 50-100 mesh and subsequent determination by a low-level proportional gas counter. The results were checked by performing the measurement of the Sr concentration by using the flame atomic absorption spectroscopy technique, to finally obtain the mass of 90Sr. From the data obtained a statistical treatment was performed using linear regressions. A reliable estimate of the mass of 90Sr was obtained based on the gravimetric technique, and secondly, the counts per minute of the third measurement in the 90Sr and 90Y equilibrium, without having to perform the analysis. These estimates have been verified with 19 milk samples, obtaining overlapping results. The novelty of the manuscript is the possibility of determining the concentration of 90Sr in milk samples, without the need to perform the third measurement in the equilibrium.

  19. Statistical issues in reporting quality data: small samples and casemix variation.

    Science.gov (United States)

    Zaslavsky, A M

    2001-12-01

    To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.

  20. TRAN-STAT, Issue No. 3, January 1978. Topics discussed: some statistical aspects of compositing field samples

    International Nuclear Information System (INIS)

    Gilbert, R.O.

    1978-01-01

    Some statistical aspects of compositing field samples of soils for determining the content of Pu are discussed. Some of the potential problems involved in pooling samples are reviewed. This is followed by more detailed discussions and examples of compositing designs, adequacy of mixing, statistical models and their role in compositing, and related topics

  1. Development of ice floe tracker algorithm to measure Lagrangian statistics in the eastern Greenland coast

    Science.gov (United States)

    Lopez, Rosalinda; Wilhelmus, Monica M.; Schodlok, Michael; Klein, Patrice

    2017-11-01

    Sea ice export through Fram Strait is a key component of the Arctic climate system. The East Greenland Current (EGC) carries most of the sea ice southwards until it melts. Lagrangian methods using sea ice buoys have been used to map ice features in polar regions. However, their spatial and temporal coverage is limited. Satellite data can provide a better tool to map sea ice flow and its variability. Here, an automated sea ice floe detection algorithm uses ice floes as tracers for surface ocean currents. We process Moderate Resolution Imaging Spectroradiometer satellite images to track ice floes (length scale 5-10 km) in the north-eastern Greenland Sea region. Our matlab-based routines effectively filter out clouds and adaptively modify the images to segment and identify ice floes. Ice floes were tracked based on persistent surface features common in successive images throughout 2016. Their daily centroid locations were extracted and its resulting trajectories are used to describe surface circulation and its variability using differential kinematic parameters. We will discuss the application of this method to a longer time series and larger spatial coverage. This enables us to derive the inter-annual variability of mesoscale features along the eastern coast of Greenland. Supported by UCR Mechanical Engineering Departmental Fellowship.

  2. Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals.

    Science.gov (United States)

    Guven, Onur; Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G

    2016-06-01

    This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors' previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp-p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat.

  3. Hardware architecture for projective model calculation and false match refining using random sample consensus algorithm

    Science.gov (United States)

    Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid

    2016-11-01

    The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.

  4. Effects of deformable registration algorithms on the creation of statistical maps for preoperative targeting in deep brain stimulation procedures

    Science.gov (United States)

    Liu, Yuan; D'Haese, Pierre-Francois; Dawant, Benoit M.

    2014-03-01

    Deep brain stimulation, which is used to treat various neurological disorders, involves implanting a permanent electrode into precise targets deep in the brain. Accurate pre-operative localization of the targets on pre-operative MRI sequence is challenging as these are typically located in homogenous regions with poor contrast. Population-based statistical atlases can assist with this process. Such atlases are created by acquiring the location of efficacious regions from numerous subjects and projecting them onto a common reference image volume using some normalization method. In previous work, we presented results concluding that non-rigid registration provided the best result for such normalization. However, this process could be biased by the choice of the reference image and/or registration approach. In this paper, we have qualitatively and quantitatively compared the performance of six recognized deformable registration methods at normalizing such data in poor contrasted regions onto three different reference volumes using a unique set of data from 100 patients. We study various metrics designed to measure the centroid, spread, and shape of the normalized data. This study leads to a total of 1800 deformable registrations and results show that statistical atlases constructed using different deformable registration methods share comparable centroids and spreads with marginal differences in their shape. Among the six methods being studied, Diffeomorphic Demons produces the largest spreads and centroids that are the furthest apart from the others in general. Among the three atlases, one atlas consistently outperforms the other two with smaller spreads for each algorithm. However, none of the differences in the spreads were found to be statistically significant, across different algorithms or across different atlases.

  5. A hybrid algorithm for reliability analysis combining Kriging and subset simulation importance sampling

    International Nuclear Information System (INIS)

    Tong, Cao; Sun, Zhili; Zhao, Qianli; Wang, Qibin; Wang, Shuang

    2015-01-01

    To solve the problem of large computation when failure probability with time-consuming numerical model is calculated, we propose an improved active learning reliability method called AK-SSIS based on AK-IS algorithm. First, an improved iterative stopping criterion in active learning is presented so that iterations decrease dramatically. Second, the proposed method introduces Subset simulation importance sampling (SSIS) into the active learning reliability calculation, and then a learning function suitable for SSIS is proposed. Finally, the efficiency of AK-SSIS is proved by two academic examples from the literature. The results show that AK-SSIS requires fewer calls to the performance function than AK-IS, and the failure probability obtained from AK-SSIS is very robust and accurate. Then this method is applied on a spur gear pair for tooth contact fatigue reliability analysis.

  6. A hybrid algorithm for reliability analysis combining Kriging and subset simulation importance sampling

    Energy Technology Data Exchange (ETDEWEB)

    Tong, Cao; Sun, Zhili; Zhao, Qianli; Wang, Qibin [Northeastern University, Shenyang (China); Wang, Shuang [Jiangxi University of Science and Technology, Ganzhou (China)

    2015-08-15

    To solve the problem of large computation when failure probability with time-consuming numerical model is calculated, we propose an improved active learning reliability method called AK-SSIS based on AK-IS algorithm. First, an improved iterative stopping criterion in active learning is presented so that iterations decrease dramatically. Second, the proposed method introduces Subset simulation importance sampling (SSIS) into the active learning reliability calculation, and then a learning function suitable for SSIS is proposed. Finally, the efficiency of AK-SSIS is proved by two academic examples from the literature. The results show that AK-SSIS requires fewer calls to the performance function than AK-IS, and the failure probability obtained from AK-SSIS is very robust and accurate. Then this method is applied on a spur gear pair for tooth contact fatigue reliability analysis.

  7. Effect of the Target Motion Sampling temperature treatment method on the statistics and performance

    International Nuclear Information System (INIS)

    Viitanen, Tuomas; Leppänen, Jaakko

    2015-01-01

    Highlights: • Use of the Target Motion Sampling (TMS) method with collision estimators is studied. • The expected values of the estimators agree with NJOY-based reference. • In most practical cases also the variances of the estimators are unaffected by TMS. • Transport calculation slow-down due to TMS dominates the impact on figures-of-merit. - Abstract: Target Motion Sampling (TMS) is a stochastic on-the-fly temperature treatment technique that is being developed as a part of the Monte Carlo reactor physics code Serpent. The method provides for modeling of arbitrary temperatures in continuous-energy Monte Carlo tracking routines with only one set of cross sections stored in the computer memory. Previously, only the performance of the TMS method in terms of CPU time per transported neutron has been discussed. Since the effective cross sections are not calculated at any point of a transport simulation with TMS, reaction rate estimators must be scored using sampled cross sections, which is expected to increase the variances and, consequently, to decrease the figures-of-merit. This paper examines the effects of the TMS on the statistics and performance in practical calculations involving reaction rate estimation with collision estimators. Against all expectations it turned out that the usage of sampled response values has no practical effect on the performance of reaction rate estimators when using TMS with elevated basis cross section temperatures (EBT), i.e. the usual way. With 0 Kelvin cross sections a significant increase in the variances of capture rate estimators was observed right below the energy region of unresolved resonances, but at these energies the figures-of-merit could be increased using a simple resampling technique to decrease the variances of the responses. It was, however, noticed that the usage of the TMS method increases the statistical deviances of all estimators, including the flux estimator, by tens of percents in the vicinity of very

  8. Methods for computational disease surveillance in infection prevention and control: Statistical process control versus Twitter's anomaly and breakout detection algorithms.

    Science.gov (United States)

    Wiemken, Timothy L; Furmanek, Stephen P; Mattingly, William A; Wright, Marc-Oliver; Persaud, Annuradha K; Guinn, Brian E; Carrico, Ruth M; Arnold, Forest W; Ramirez, Julio A

    2018-02-01

    Although not all health care-associated infections (HAIs) are preventable, reducing HAIs through targeted intervention is key to a successful infection prevention program. To identify areas in need of targeted intervention, robust statistical methods must be used when analyzing surveillance data. The objective of this study was to compare and contrast statistical process control (SPC) charts with Twitter's anomaly and breakout detection algorithms. SPC and anomaly/breakout detection (ABD) charts were created for vancomycin-resistant Enterococcus, Acinetobacter baumannii, catheter-associated urinary tract infection, and central line-associated bloodstream infection data. Both SPC and ABD charts detected similar data points as anomalous/out of control on most charts. The vancomycin-resistant Enterococcus ABD chart detected an extra anomalous point that appeared to be higher than the same time period in prior years. Using a small subset of the central line-associated bloodstream infection data, the ABD chart was able to detect anomalies where the SPC chart was not. SPC charts and ABD charts both performed well, although ABD charts appeared to work better in the context of seasonal variation and autocorrelation. Because they account for common statistical issues in HAI data, ABD charts may be useful for practitioners for analysis of HAI surveillance data. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  9. When the Ostrich-Algorithm Fails: Blanking Method Affects Spike Train Statistics

    Directory of Open Access Journals (Sweden)

    Kevin Joseph

    2018-04-01

    Full Text Available Modern electroceuticals are bound to employ the usage of electrical high frequency (130–180 Hz stimulation carried out under closed loop control, most prominent in the case of movement disorders. However, particular challenges are faced when electrical recordings of neuronal tissue are carried out during high frequency electrical stimulation, both in-vivo and in-vitro. This stimulation produces undesired artifacts and can render the recorded signal only partially useful. The extent of these artifacts is often reduced by temporarily grounding the recording input during stimulation pulses. In the following study, we quantify the effects of this method, “blanking,” on the spike count and spike train statistics. Starting from a theoretical standpoint, we calculate a loss in the absolute number of action potentials, depending on: width of the blanking window, frequency of stimulation, and intrinsic neuronal activity. These calculations were then corroborated by actual high signal to noise ratio (SNR single cell recordings. We state that, for clinically relevant frequencies of 130 Hz (used for movement disorders and realistic blanking windows of 2 ms, up to 27% of actual existing spikes are lost. We strongly advice cautioned use of the blanking method when spike rate quantification is attempted.Impact statementBlanking (artifact removal by temporarily grounding input, depending on recording parameters, can lead to significant spike loss. Very careful use of blanking circuits is advised.

  10. When the Ostrich-Algorithm Fails: Blanking Method Affects Spike Train Statistics.

    Science.gov (United States)

    Joseph, Kevin; Mottaghi, Soheil; Christ, Olaf; Feuerstein, Thomas J; Hofmann, Ulrich G

    2018-01-01

    Modern electroceuticals are bound to employ the usage of electrical high frequency (130-180 Hz) stimulation carried out under closed loop control, most prominent in the case of movement disorders. However, particular challenges are faced when electrical recordings of neuronal tissue are carried out during high frequency electrical stimulation, both in-vivo and in-vitro . This stimulation produces undesired artifacts and can render the recorded signal only partially useful. The extent of these artifacts is often reduced by temporarily grounding the recording input during stimulation pulses. In the following study, we quantify the effects of this method, "blanking," on the spike count and spike train statistics. Starting from a theoretical standpoint, we calculate a loss in the absolute number of action potentials, depending on: width of the blanking window, frequency of stimulation, and intrinsic neuronal activity. These calculations were then corroborated by actual high signal to noise ratio (SNR) single cell recordings. We state that, for clinically relevant frequencies of 130 Hz (used for movement disorders) and realistic blanking windows of 2 ms, up to 27% of actual existing spikes are lost. We strongly advice cautioned use of the blanking method when spike rate quantification is attempted. Blanking (artifact removal by temporarily grounding input), depending on recording parameters, can lead to significant spike loss. Very careful use of blanking circuits is advised.

  11. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    DEFF Research Database (Denmark)

    Edjabou, Vincent Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona

    2015-01-01

    Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both...... comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub......-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10-50 waste fractions, organised according to a three-level (tiered approach) facilitating,comparison of the waste data between individual sub-areas with different fractionation (waste...

  12. Autonomous spatially adaptive sampling in experiments based on curvature, statistical error and sample spacing with applications in LDA measurements

    Science.gov (United States)

    Theunissen, Raf; Kadosh, Jesse S.; Allen, Christian B.

    2015-06-01

    Spatially varying signals are typically sampled by collecting uniformly spaced samples irrespective of the signal content. For signals with inhomogeneous information content, this leads to unnecessarily dense sampling in regions of low interest or insufficient sample density at important features, or both. A new adaptive sampling technique is presented directing sample collection in proportion to local information content, capturing adequately the short-period features while sparsely sampling less dynamic regions. The proposed method incorporates a data-adapted sampling strategy on the basis of signal curvature, sample space-filling, variable experimental uncertainty and iterative improvement. Numerical assessment has indicated a reduction in the number of samples required to achieve a predefined uncertainty level overall while improving local accuracy for important features. The potential of the proposed method has been further demonstrated on the basis of Laser Doppler Anemometry experiments examining the wake behind a NACA0012 airfoil and the boundary layer characterisation of a flat plate.

  13. Image-Based Phenotypic Screening with Human Primary T Cells Using One-Dimensional Imaging Cytometry with Self-Tuning Statistical-Gating Algorithms.

    Science.gov (United States)

    Wang, Steve S; Ehrlich, Daniel J

    2017-09-01

    The parallel microfluidic cytometer (PMC) is an imaging flow cytometer that operates on statistical analysis of low-pixel-count, one-dimensional (1D) line scans. It is highly efficient in data collection and operates on suspension cells. In this article, we present a supervised automated pipeline for the PMC that minimizes operator intervention by incorporating multivariate logistic regression for data scoring. We test the self-tuning statistical algorithms in a human primary T-cell activation assay in flow using nuclear factor of activated T cells (NFAT) translocation as a readout and readily achieve an average Z' of 0.55 and strictly standardized mean difference of 13 with standard phorbol myristate acetate/ionomycin induction. To implement the tests, we routinely load 4 µL samples and can readout 3000 to 9000 independent conditions from 15 mL of primary human blood (buffy coat fraction). We conclude that the new technology will support primary-cell protein-localization assays and "on-the-fly" data scoring at a sample throughput of more than 100,000 wells per day and that it is, in principle, consistent with a primary pharmaceutical screen.

  14. A genetic algorithm-based framework for wavelength selection on sample categorization.

    Science.gov (United States)

    Anzanello, Michel J; Yamashita, Gabrielli; Marcelo, Marcelo; Fogliatto, Flávio S; Ortiz, Rafael S; Mariotti, Kristiane; Ferrão, Marco F

    2017-08-01

    In forensic and pharmaceutical scenarios, the application of chemometrics and optimization techniques has unveiled common and peculiar features of seized medicine and drug samples, helping investigative forces to track illegal operations. This paper proposes a novel framework aimed at identifying relevant subsets of attenuated total reflectance Fourier transform infrared (ATR-FTIR) wavelengths for classifying samples into two classes, for example authentic or forged categories in case of medicines, or salt or base form in cocaine analysis. In the first step of the framework, the ATR-FTIR spectra were partitioned into equidistant intervals and the k-nearest neighbour (KNN) classification technique was applied to each interval to insert samples into proper classes. In the next step, selected intervals were refined through the genetic algorithm (GA) by identifying a limited number of wavelengths from the intervals previously selected aimed at maximizing classification accuracy. When applied to Cialis®, Viagra®, and cocaine ATR-FTIR datasets, the proposed method substantially decreased the number of wavelengths needed to categorize, and increased the classification accuracy. From a practical perspective, the proposed method provides investigative forces with valuable information towards monitoring illegal production of drugs and medicines. In addition, focusing on a reduced subset of wavelengths allows the development of portable devices capable of testing the authenticity of samples during police checking events, avoiding the need for later laboratorial analyses and reducing equipment expenses. Theoretically, the proposed GA-based approach yields more refined solutions than the current methods relying on interval approaches, which tend to insert irrelevant wavelengths in the retained intervals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    International Nuclear Information System (INIS)

    Edjabou, Maklawe Essonanawe; Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn; Petersen, Claus; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2015-01-01

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  16. Municipal solid waste composition: Sampling methodology, statistical analyses, and case study evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Edjabou, Maklawe Essonanawe, E-mail: vine@env.dtu.dk [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Jensen, Morten Bang; Götze, Ramona; Pivnenko, Kostyantyn [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark); Petersen, Claus [Econet AS, Omøgade 8, 2.sal, 2100 Copenhagen (Denmark); Scheutz, Charlotte; Astrup, Thomas Fruergaard [Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs. Lyngby (Denmark)

    2015-02-15

    Highlights: • Tiered approach to waste sorting ensures flexibility and facilitates comparison of solid waste composition data. • Food and miscellaneous wastes are the main fractions contributing to the residual household waste. • Separation of food packaging from food leftovers during sorting is not critical for determination of the solid waste composition. - Abstract: Sound waste management and optimisation of resource recovery require reliable data on solid waste generation and composition. In the absence of standardised and commonly accepted waste characterisation methodologies, various approaches have been reported in literature. This limits both comparability and applicability of the results. In this study, a waste sampling and sorting methodology for efficient and statistically robust characterisation of solid waste was introduced. The methodology was applied to residual waste collected from 1442 households distributed among 10 individual sub-areas in three Danish municipalities (both single and multi-family house areas). In total 17 tonnes of waste were sorted into 10–50 waste fractions, organised according to a three-level (tiered approach) facilitating comparison of the waste data between individual sub-areas with different fractionation (waste from one municipality was sorted at “Level III”, e.g. detailed, while the two others were sorted only at “Level I”). The results showed that residual household waste mainly contained food waste (42 ± 5%, mass per wet basis) and miscellaneous combustibles (18 ± 3%, mass per wet basis). The residual household waste generation rate in the study areas was 3–4 kg per person per week. Statistical analyses revealed that the waste composition was independent of variations in the waste generation rate. Both, waste composition and waste generation rates were statistically similar for each of the three municipalities. While the waste generation rates were similar for each of the two housing types (single

  17. Finite-sample instrumental variables Inference using an Asymptotically Pivotal Statistic

    NARCIS (Netherlands)

    Bekker, P.; Kleibergen, F.R.

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation ofthe Anderson-Rubin (AR) statistic in instrumental variables regression.Compared to the AR-statistic this K-statistic shows improvedasymptotic efficiency in terms of degrees of freedom in overidentifiedmodels and yet it shares,

  18. Finite-sample instrumental variables inference using an asymptotically pivotal statistic

    NARCIS (Netherlands)

    Bekker, Paul A.; Kleibergen, Frank

    2001-01-01

    The paper considers the K-statistic, Kleibergen’s (2000) adaptation of the Anderson-Rubin (AR) statistic in instrumental variables regression. Compared to the AR-statistic this K-statistic shows improved asymptotic efficiency in terms of degrees of freedom in overidenti?ed models and yet it shares,

  19. Impact of a New Adaptive Statistical Iterative Reconstruction (ASIR)-V Algorithm on Image Quality in Coronary Computed Tomography Angiography.

    Science.gov (United States)

    Pontone, Gianluca; Muscogiuri, Giuseppe; Andreini, Daniele; Guaricci, Andrea I; Guglielmo, Marco; Baggiano, Andrea; Fazzari, Fabio; Mushtaq, Saima; Conte, Edoardo; Annoni, Andrea; Formenti, Alberto; Mancini, Elisabetta; Verdecchia, Massimo; Campari, Alessandro; Martini, Chiara; Gatti, Marco; Fusini, Laura; Bonfanti, Lorenzo; Consiglio, Elisa; Rabbat, Mark G; Bartorelli, Antonio L; Pepi, Mauro

    2018-03-27

    A new postprocessing algorithm named adaptive statistical iterative reconstruction (ASIR)-V has been recently introduced. The aim of this article was to analyze the impact of ASIR-V algorithm on signal, noise, and image quality of coronary computed tomography angiography. Fifty consecutive patients underwent clinically indicated coronary computed tomography angiography (Revolution CT; GE Healthcare, Milwaukee, WI). Images were reconstructed using filtered back projection and ASIR-V 0%, and a combination of filtered back projection and ASIR-V 20%-80% and ASIR-V 100%. Image noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR) were calculated for left main coronary artery (LM), left anterior descending artery (LAD), left circumflex artery (LCX), and right coronary artery (RCA) and were compared between the different postprocessing algorithms used. Similarly a four-point Likert image quality score of coronary segments was graded for each dataset and compared. A cutoff value of P ASIR-V 0%, ASIR-V 100% demonstrated a significant reduction of image noise in all coronaries (P ASIR-V 0%, SNR was significantly higher with ASIR-V 60% in LM (P ASIR-V 0%, CNR for ASIR-V ≥60% was significantly improved in LM (P ASIR-V ≥80%. ASIR-V 60% had significantly better Likert image quality scores compared to ASIR-V 0% in segment-, vessel-, and patient-based analyses (P ASIR-V 60% provides the optimal balance between image noise, SNR, CNR, and image quality. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  20. Chemical library subset selection algorithms: a unified derivation using spatial statistics.

    Science.gov (United States)

    Hamprecht, Fred A; Thiel, Walter; van Gunsteren, Wilfred F

    2002-01-01

    If similar compounds have similar activity, rational subset selection becomes superior to random selection in screening for pharmacological lead discovery programs. Traditional approaches to this experimental design problem fall into two classes: (i) a linear or quadratic response function is assumed (ii) some space filling criterion is optimized. The assumptions underlying the first approach are clear but not always defendable; the second approach yields more intuitive designs but lacks a clear theoretical foundation. We model activity in a bioassay as realization of a stochastic process and use the best linear unbiased estimator to construct spatial sampling designs that optimize the integrated mean square prediction error, the maximum mean square prediction error, or the entropy. We argue that our approach constitutes a unifying framework encompassing most proposed techniques as limiting cases and sheds light on their underlying assumptions. In particular, vector quantization is obtained, in dimensions up to eight, in the limiting case of very smooth response surfaces for the integrated mean square error criterion. Closest packing is obtained for very rough surfaces under the integrated mean square error and entropy criteria. We suggest to use either the integrated mean square prediction error or the entropy as optimization criteria rather than approximations thereof and propose a scheme for direct iterative minimization of the integrated mean square prediction error. Finally, we discuss how the quality of chemical descriptors manifests itself and clarify the assumptions underlying the selection of diverse or representative subsets.

  1. Super resolution reconstruction of μ-CT image of rock sample using neighbour embedding algorithm

    Science.gov (United States)

    Wang, Yuzhu; Rahman, Sheik S.; Arns, Christoph H.

    2018-03-01

    X-ray computed tomography (μ-CT) is considered to be the most effective way to obtain the inner structure of rock sample without destructions. However, its limited resolution hampers its ability to probe sub-micro structures which is critical for flow transportation of rock sample. In this study, we propose an innovative methodology to improve the resolution of μ-CT image using neighbour embedding algorithm where low frequency information is provided by μ-CT image itself while high frequency information is supplemented by high resolution scanning electron microscopy (SEM) image. In order to obtain prior for reconstruction, a large number of image patch pairs contain high- and low- image patches are extracted from the Gaussian image pyramid generated by SEM image. These image patch pairs contain abundant information about tomographic evolution of local porous structures under different resolution spaces. Relying on the assumption of self-similarity of porous structure, this prior information can be used to supervise the reconstruction of high resolution μ-CT image effectively. The experimental results show that the proposed method is able to achieve the state-of-the-art performance.

  2. Effects of Sample Size and Dimensionality on the Performance of Four Algorithms for Inference of Association Networks in Metabonomics

    NARCIS (Netherlands)

    Suarez Diez, M.; Saccenti, E.

    2015-01-01

    We investigated the effect of sample size and dimensionality on the performance of four algorithms (ARACNE, CLR, CORR, and PCLRC) when they are used for the inference of metabolite association networks. We report that as many as 100-400 samples may be necessary to obtain stable network estimations,

  3. Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm

    Science.gov (United States)

    Artrith, Nongnuch; Urban, Alexander; Ceder, Gerbrand

    2018-06-01

    The atomistic modeling of amorphous materials requires structure sizes and sampling statistics that are challenging to achieve with first-principles methods. Here, we propose a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs). We show for the example of the amorphous LiSi alloy that around 1000 first-principles calculations are sufficient for the ANN-potential assisted sampling of low-energy atomic configurations in the entire amorphous LixSi phase space. The obtained phase diagram is validated by comparison with the results from an extensive sampling of LixSi configurations using molecular dynamics simulations and a general ANN potential trained to ˜45 000 first-principles calculations. This demonstrates the utility of the approach for the first-principles modeling of amorphous materials.

  4. Circular contour retrieval in real-world conditions by higher order statistics and an alternating-least squares algorithm

    Science.gov (United States)

    Jiang, Haiping; Marot, Julien; Fossati, Caroline; Bourennane, Salah

    2011-12-01

    In real-world conditions, contours are most often blurred in digital images because of acquisition conditions such as movement, light transmission environment, and defocus. Among image segmentation methods, Hough transform requires a computational load which increases with the number of noise pixels, level set methods also require a high computational load, and some other methods assume that the contours are one-pixel wide. For the first time, we retrieve the characteristics of multiple possibly concentric blurred circles. We face correlated noise environment, to get closer to real-world conditions. For this, we model a blurred circle by a few parameters--center coordinates, radius, and spread--which characterize its mean position and gray level variations. We derive the signal model which results from signal generation on circular antenna. Linear antennas provide the center coordinates. To retrieve the circle radii, we adapt the second-order statistics TLS-ESPRIT method for non-correlated noise environment, and propose a novel version of TLS-ESPRIT based on higher-order statistics for correlated noise environment. Then, we derive a least-squares criterion and propose an alternating least-squares algorithm to retrieve simultaneously all spread values of concentric circles. Experiments performed on hand-made and real-world images show that the proposed methods outperform the Hough transform and a level set method dedicated to blurred contours in terms of computational load. Moreover, the proposed model and optimization method provide the information of the contour grey level variations.

  5. Supplementary Material for: Compressing an Ensemble With Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano

    2016-01-01

    One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific datasets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gridded geometry of the data and parallelization across processors. It is therefore computationally convenient and allows to fit a nontrivial model to a dataset of 1 billion data points with a covariance matrix comprising of 1018 entries. Supplementary materials for this article are available online.

  6. Entropic sampling of simple polymer models within Wang-Landau algorithm

    International Nuclear Information System (INIS)

    Vorontsov-Velyaminov, P N; Volkov, N A; Yurchenko, A A

    2004-01-01

    In this paper we apply a new simulation technique proposed in Wang and Landau (WL) (2001 Phys. Rev. Lett. 86 2050) to sampling of three-dimensional lattice and continuous models of polymer chains. Distributions obtained by homogeneous (unconditional) random walks are compared with results of entropic sampling (ES) within the WL algorithm. While homogeneous sampling gives reliable results typically in the range of 4-5 orders of magnitude, the WL entropic sampling yields them in the range of 20-30 orders and even larger with comparable computer effort. A combination of homogeneous and WL sampling provides reliable data for events with probabilities down to 10 -35 . For the lattice model we consider both the athermal case (self-avoiding walks, SAWs) and the thermal case when an energy is attributed to each contact between nonbonded monomers in a self-avoiding walk. For short chains the simulation results are checked by comparison with the exact data. In WL calculations for chain lengths up to N = 300 scaling relations for SAWs are well reproduced. In the thermal case distribution over the number of contacts is obtained in the N-range up to N = 100 and the canonical averages - internal energy, heat capacity, excess canonical entropy, mean square end-to-end distance - are calculated as a result in a wide temperature range. The continuous model is studied in the athermal case. By sorting conformations of a continuous phantom freely joined N-bonded chain with a unit bond length over a stochastic variable, the minimum distance between nonbonded beads, we determine the probability distribution for the N-bonded chain with hard sphere monomer units over its diameter a in the complete diameter range, 0 ≤ a ≤ 2, within a single ES run. This distribution provides us with excess specific entropy for a set of diameters a in this range. Calculations were made for chain lengths up to N = 100 and results were extrapolated to N → ∞ for a in the range 0 ≤ a ≤ 1.25

  7. Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

    Science.gov (United States)

    Breunig, Nancy A.

    Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…

  8. A Fast and Accurate Algorithm for l1 Minimization Problems in Compressive Sampling (Preprint)

    Science.gov (United States)

    2013-01-22

    However, updating uk+1 via the formulation of Step 2 in Algorithm 1 can be implemented through the use of the component-wise Gauss - Seidel iteration which...may accelerate the rate of convergence of the algorithm and therefore reduce the total CPU-time consumed. The efficiency of component-wise Gauss - Seidel ...Micchelli, L. Shen, and Y. Xu, A proximity algorithm accelerated by Gauss - Seidel iterations for L1/TV denoising models, Inverse Problems, 28 (2012), p

  9. Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm.

    Science.gov (United States)

    de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino

    2018-05-01

    This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.

  10. Multiobjecitve Sampling Design for Calibration of Water Distribution Network Model Using Genetic Algorithm and Neural Network

    Directory of Open Access Journals (Sweden)

    Kourosh Behzadian

    2008-03-01

    Full Text Available In this paper, a novel multiobjective optimization model is presented for selecting optimal locations in the water distribution network (WDN with the aim of installing pressure loggers. The pressure data collected at optimal locations will be used later on in the calibration of the proposed WDN model. Objective functions consist of maximization of calibrated model prediction accuracy and minimization of the total cost for sampling design. In order to decrease the model run time, an optimization model has been developed using multiobjective genetic algorithm and adaptive neural network (MOGA-ANN. Neural networks (NNs are initially trained after a number of initial GA generations and periodically retrained and updated after generation of a specified number of full model-analyzed solutions. Trained NNs are replaced with the fitness evaluation of some chromosomes within the GA progress. Using cache prevents objective function evaluation of repetitive chromosomes within GA. Optimal solutions are obtained through pareto-optimal front with respect to the two objective functions. Results show that jointing NNs in MOGA for approximating portions of chromosomes’ fitness in each generation leads to considerable savings in model run time and can be promising for reducing run-time in optimization models with significant computational effort.

  11. Mathematical background and attitudes toward statistics in a sample of Spanish college students.

    Science.gov (United States)

    Carmona, José; Martínez, Rafael J; Sánchez, Manuel

    2005-08-01

    To examine the relation of mathematical background and initial attitudes toward statistics of Spanish college students in social sciences the Survey of Attitudes Toward Statistics was given to 827 students. Multivariate analyses tested the effects of two indicators of mathematical background (amount of exposure and achievement in previous courses) on the four subscales. Analysis suggested grades in previous courses are more related to initial attitudes toward statistics than the number of mathematics courses taken. Mathematical background was related with students' affective responses to statistics but not with their valuing of statistics. Implications of possible research are discussed.

  12. Microdosimetry calculations for monoenergetic electrons using Geant4-DNA combined with a weighted track sampling algorithm.

    Science.gov (United States)

    Famulari, Gabriel; Pater, Piotr; Enger, Shirin A

    2017-07-07

    The aim of this study was to calculate microdosimetric distributions for low energy electrons simulated using the Monte Carlo track structure code Geant4-DNA. Tracks for monoenergetic electrons with kinetic energies ranging from 100 eV to 1 MeV were simulated in an infinite spherical water phantom using the Geant4-DNA extension included in Geant4 toolkit version 10.2 (patch 02). The microdosimetric distributions were obtained through random sampling of transfer points and overlaying scoring volumes within the associated volume of the tracks. Relative frequency distributions of energy deposition f(>E)/f(>0) and dose mean lineal energy ([Formula: see text]) values were calculated in nanometer-sized spherical and cylindrical targets. The effects of scoring volume and scoring techniques were examined. The results were compared with published data generated using MOCA8B and KURBUC. Geant4-DNA produces a lower frequency of higher energy deposits than MOCA8B. The [Formula: see text] values calculated with Geant4-DNA are smaller than those calculated using MOCA8B and KURBUC. The differences are mainly due to the lower ionization and excitation cross sections of Geant4-DNA for low energy electrons. To a lesser extent, discrepancies can also be attributed to the implementation in this study of a new and fast scoring technique that differs from that used in previous studies. For the same mean chord length ([Formula: see text]), the [Formula: see text] calculated in cylindrical volumes are larger than those calculated in spherical volumes. The discrepancies due to cross sections and scoring geometries increase with decreasing scoring site dimensions. A new set of [Formula: see text] values has been presented for monoenergetic electrons using a fast track sampling algorithm and the most recent physics models implemented in Geant4-DNA. This dataset can be combined with primary electron spectra to predict the radiation quality of photon and electron beams.

  13. SU-E-T-21: A Novel Sampling Algorithm to Reduce Intensity-Modulated Radiation Therapy (IMRT) Optimization Time

    International Nuclear Information System (INIS)

    Tiwari, P; Xie, Y; Chen, Y; Deasy, J

    2014-01-01

    Purpose: The IMRT optimization problem requires substantial computer time to find optimal dose distributions because of the large number of variables and constraints. Voxel sampling reduces the number of constraints and accelerates the optimization process, but usually deteriorates the quality of the dose distributions to the organs. We propose a novel sampling algorithm that accelerates the IMRT optimization process without significantly deteriorating the quality of the dose distribution. Methods: We included all boundary voxels, as well as a sampled fraction of interior voxels of organs in the optimization. We selected a fraction of interior voxels using a clustering algorithm, that creates clusters of voxels that have similar influence matrix signatures. A few voxels are selected from each cluster based on the pre-set sampling rate. Results: We ran sampling and no-sampling IMRT plans for de-identified head and neck treatment plans. Testing with the different sampling rates, we found that including 10% of inner voxels produced the good dose distributions. For this optimal sampling rate, the algorithm accelerated IMRT optimization by a factor of 2–3 times with a negligible loss of accuracy that was, on average, 0.3% for common dosimetric planning criteria. Conclusion: We demonstrated that a sampling could be developed that reduces optimization time by more than a factor of 2, without significantly degrading the dose quality

  14. Head-to-head comparison of adaptive statistical and model-based iterative reconstruction algorithms for submillisievert coronary CT angiography.

    Science.gov (United States)

    Benz, Dominik C; Fuchs, Tobias A; Gräni, Christoph; Studer Bruengger, Annina A; Clerc, Olivier F; Mikulicic, Fran; Messerli, Michael; Stehli, Julia; Possner, Mathias; Pazhenkottil, Aju P; Gaemperli, Oliver; Kaufmann, Philipp A; Buechel, Ronny R

    2018-02-01

    Iterative reconstruction (IR) algorithms allow for a significant reduction in radiation dose of coronary computed tomography angiography (CCTA). We performed a head-to-head comparison of adaptive statistical IR (ASiR) and model-based IR (MBIR) algorithms to assess their impact on quantitative image parameters and diagnostic accuracy for submillisievert CCTA. CCTA datasets of 91 patients were reconstructed using filtered back projection (FBP), increasing contributions of ASiR (20, 40, 60, 80, and 100%), and MBIR. Signal and noise were measured in the aortic root to calculate signal-to-noise ratio (SNR). In a subgroup of 36 patients, diagnostic accuracy of ASiR 40%, ASiR 100%, and MBIR for diagnosis of coronary artery disease (CAD) was compared with invasive coronary angiography. Median radiation dose was 0.21 mSv for CCTA. While increasing levels of ASiR gradually reduced image noise compared with FBP (up to - 48%, P ASiR (-59% compared with ASiR 100%; P ASiR 40% and ASiR 100% resulted in substantially lower diagnostic accuracy to detect CAD as diagnosed by invasive coronary angiography compared with MBIR: sensitivity and specificity were 100 and 37%, 100 and 57%, and 100 and 74% for ASiR 40%, ASiR 100%, and MBIR, respectively. MBIR offers substantial noise reduction with increased SNR, paving the way for implementation of submillisievert CCTA protocols in clinical routine. In contrast, inferior noise reduction by ASiR negatively affects diagnostic accuracy of submillisievert CCTA for CAD detection. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2017. For permissions, please email: journals.permissions@oup.com.

  15. Hail statistic in Western Europe based on a hyrid cell-tracking algorithm combining radar signals with hailstone observations

    Science.gov (United States)

    Fluck, Elody

    2015-04-01

    Hail statistic in Western Europe based on a hybrid cell-tracking algorithm combining radar signals with hailstone observations Elody Fluck¹, Michael Kunz¹ , Peter Geissbühler², Stefan P. Ritz² With hail damage estimated over Billions of Euros for a single event (e.g., hailstorm Andreas on 27/28 July 2013), hail constitute one of the major atmospheric risks in various parts of Europe. The project HAMLET (Hail Model for Europe) in cooperation with the insurance company Tokio Millennium Re aims at estimating hail probability, hail hazard and, combined with vulnerability, hail risk for several European countries (Germany, Switzerland, France, Netherlands, Austria, Belgium and Luxembourg). Hail signals are obtained from radar reflectivity since this proxy is available with a high temporal and spatial resolution using several hail proxies, especially radar data. The focus in the first step is on Germany and France for the periods 2005- 2013 and 1999 - 2013, respectively. In the next step, the methods will be transferred and extended to other regions. A cell-tracking algorithm TRACE2D was adjusted and applied to two dimensional radar reflectivity data from different radars operated by European weather services such as German weather service (DWD) and French weather service (Météo-France). Strong convective cells are detected by considering 3 connected pixels over 45 dBZ (Reflectivity Cores RCs) in a radar scan. Afterwards, the algorithm tries to find the same RCs in the next 5 minute radar scan and, thus, track the RCs centers over time and space. Additional information about hailstone diameters provided by ESWD (European Severe Weather Database) is used to determine hail intensity of the detected hail swaths. Maximum hailstone diameters are interpolated along and close to the individual hail tracks giving an estimation of mean diameters for the detected hail swaths. Furthermore, a stochastic event set is created by randomizing the parameters obtained from the

  16. Sampling

    CERN Document Server

    Thompson, Steven K

    2012-01-01

    Praise for the Second Edition "This book has never had a competitor. It is the only book that takes a broad approach to sampling . . . any good personal statistics library should include a copy of this book." —Technometrics "Well-written . . . an excellent book on an important subject. Highly recommended." —Choice "An ideal reference for scientific researchers and other professionals who use sampling." —Zentralblatt Math Features new developments in the field combined with all aspects of obtaining, interpreting, and using sample data Sampling provides an up-to-date treat

  17. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    Science.gov (United States)

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  18. Estimation of Peaking Factor Uncertainty due to Manufacturing Tolerance using Statistical Sampling Method

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Kyung Hoon; Park, Ho Jin; Lee, Chung Chan; Cho, Jin Young [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2015-10-15

    The purpose of this paper is to study the effect on output parameters in the lattice physics calculation due to the last input uncertainty such as manufacturing deviations from nominal value for material composition and geometric dimensions. In a nuclear design and analysis, the lattice physics calculations are usually employed to generate lattice parameters for the nodal core simulation and pin power reconstruction. These lattice parameters which consist of homogenized few-group cross-sections, assembly discontinuity factors, and form-functions can be affected by input uncertainties which arise from three different sources: 1) multi-group cross-section uncertainties, 2) the uncertainties associated with methods and modeling approximations utilized in lattice physics codes, and 3) fuel/assembly manufacturing uncertainties. In this paper, data provided by the light water reactor (LWR) uncertainty analysis in modeling (UAM) benchmark has been used as the manufacturing uncertainties. First, the effect of each input parameter has been investigated through sensitivity calculations at the fuel assembly level. Then, uncertainty in prediction of peaking factor due to the most sensitive input parameter has been estimated using the statistical sampling method, often called the brute force method. For our analysis, the two-dimensional transport lattice code DeCART2D and its ENDF/B-VII.1 based 47-group library were used to perform the lattice physics calculation. Sensitivity calculations have been performed in order to study the influence of manufacturing tolerances on the lattice parameters. The manufacturing tolerance that has the largest influence on the k-inf is the fuel density. The second most sensitive parameter is the outer clad diameter.

  19. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    Energy Technology Data Exchange (ETDEWEB)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung [Hanyang Univ., Seoul (Korea, Republic of); Noh, Jae Man [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2013-10-15

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis.

  20. A Preliminary Study on Sensitivity and Uncertainty Analysis with Statistic Method: Uncertainty Analysis with Cross Section Sampling from Lognormal Distribution

    International Nuclear Information System (INIS)

    Song, Myung Sub; Kim, Song Hyun; Kim, Jong Kyung; Noh, Jae Man

    2013-01-01

    The uncertainty evaluation with statistical method is performed by repetition of transport calculation with sampling the directly perturbed nuclear data. Hence, the reliable uncertainty result can be obtained by analyzing the results of the numerous transport calculations. One of the problems in the uncertainty analysis with the statistical approach is known as that the cross section sampling from the normal (Gaussian) distribution with relatively large standard deviation leads to the sampling error of the cross sections such as the sampling of the negative cross section. Some collection methods are noted; however, the methods can distort the distribution of the sampled cross sections. In this study, a sampling method of the nuclear data is proposed by using lognormal distribution. After that, the criticality calculations with sampled nuclear data are performed and the results are compared with that from the normal distribution which is conventionally used in the previous studies. In this study, the statistical sampling method of the cross section with the lognormal distribution was proposed to increase the sampling accuracy without negative sampling error. Also, a stochastic cross section sampling and writing program was developed. For the sensitivity and uncertainty analysis, the cross section sampling was pursued with the normal and lognormal distribution. The uncertainties, which are caused by covariance of (n,.) cross sections, were evaluated by solving GODIVA problem. The results show that the sampling method with lognormal distribution can efficiently solve the negative sampling problem referred in the previous studies. It is expected that this study will contribute to increase the accuracy of the sampling-based uncertainty analysis

  1. Statistical Physics, Optimization, Inference, and Message-Passing Algorithms : Lecture Notes of the Les Houches School of Physics : Special Issue, October 2013

    CERN Document Server

    Ricci-Tersenghi, Federico; Zdeborova, Lenka; Zecchina, Riccardo; Tramel, Eric W; Cugliandolo, Leticia F

    2015-01-01

    This book contains a collection of the presentations that were given in October 2013 at the Les Houches Autumn School on statistical physics, optimization, inference, and message-passing algorithms. In the last decade, there has been increasing convergence of interest and methods between theoretical physics and fields as diverse as probability, machine learning, optimization, and inference problems. In particular, much theoretical and applied work in statistical physics and computer science has relied on the use of message-passing algorithms and their connection to the statistical physics of glasses and spin glasses. For example, both the replica and cavity methods have led to recent advances in compressed sensing, sparse estimation, and random constraint satisfaction, to name a few. This book’s detailed pedagogical lectures on statistical inference, computational complexity, the replica and cavity methods, and belief propagation are aimed particularly at PhD students, post-docs, and young researchers desir...

  2. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Science.gov (United States)

    2010-01-01

    ... the number of defects (or defectives), which exceed the sample unit tolerance (“T”), in a series of... accumulation of defects (or defectives) allowed to exceed the sample unit tolerance (“T”) in any sample unit or consecutive group of sample units. (ii) CuSum value. The accumulated number of defects (or defectives) that...

  3. The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

    International Nuclear Information System (INIS)

    Ghanbari, Y.; Habibnia, A.; Memar, A.

    2009-01-01

    In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample

  4. Statistical analysis of temperature data sampled at Station-M in the Norwegian Sea

    Science.gov (United States)

    Lorentzen, Torbjørn

    2014-02-01

    The paper analyzes sea temperature data sampled at Station-M in the Norwegian Sea. The data cover the period 1948-2010. The following questions are addressed: What type of stochastic process characterizes the temperature series? Are there any changes or patterns which indicate climate change? Are there any characteristics in the data which can be linked to the shrinking sea-ice in the Arctic area? Can the series be modeled consistently and applied in forecasting of the future sea temperature? The paper applies the following methods: Augmented Dickey-Fuller tests for testing of unit-root and stationarity, ARIMA-models in univariate modeling, cointegration and error-correcting models are applied in estimating short- and long-term dynamics of non-stationary series, Granger-causality tests in analyzing the interaction pattern between the deep and upper layer temperatures, and simultaneous equation systems are applied in forecasting future temperature. The paper shows that temperature at 2000 m Granger-causes temperature at 150 m, and that the 2000 m series can represent an important information carrier of the long-term development of the sea temperature in the geographical area. Descriptive statistics shows that the temperature level has been on a positive trend since the beginning of the 1980s which is also measured in most of the oceans in the North Atlantic. The analysis shows that the temperature series are cointegrated which means they share the same long-term stochastic trend and they do not diverge too far from each other. The measured long-term temperature increase is one of the factors that can explain the shrinking summer sea-ice in the Arctic region. The analysis shows that there is a significant negative correlation between the shrinking sea ice and the sea temperature at Station-M. The paper shows that the temperature forecasts are conditioned on the properties of the stochastic processes, causality pattern between the variables and specification of model

  5. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Cabalin, L.M.; Gonzalez, A. [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain); Ruiz, J. [Department of Applied Physics I, University of Malaga, E-29071 Malaga (Spain); Laserna, J.J., E-mail: laserna@uma.e [Department of Analytical Chemistry, University of Malaga, E-29071 Malaga (Spain)

    2010-08-15

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s{sup -1}. Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  6. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    Science.gov (United States)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  7. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Cabalin, L.M.; Gonzalez, A.; Ruiz, J.; Laserna, J.J.

    2010-01-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s -1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  8. Survey of statistical and sampling needs for environmental monitoring of commercial low-level radioactive waste disposal facilities

    International Nuclear Information System (INIS)

    Eberhardt, L.L.; Thomas, J.M.

    1986-07-01

    This project was designed to develop guidance for implementing 10 CFR Part 61 and to determine the overall needs for sampling and statistical work in characterizing, surveying, monitoring, and closing commercial low-level waste sites. When cost-effectiveness and statistical reliability are of prime importance, then double sampling, compositing, and stratification (with optimal allocation) are identified as key issues. If the principal concern is avoiding questionable statistical practice, then the applicability of kriging (for assessing spatial pattern), methods for routine monitoring, and use of standard textbook formulae in reporting monitoring results should be reevaluated. Other important issues identified include sampling for estimating model parameters and the use of data from left-censored (less than detectable limits) distributions

  9. Statistical Sampling Handbook for Student Aid Programs: A Reference for Non-Statisticians. Winter 1984.

    Science.gov (United States)

    Office of Student Financial Assistance (ED), Washington, DC.

    A manual on sampling is presented to assist audit and program reviewers, project officers, managers, and program specialists of the U.S. Office of Student Financial Assistance (OSFA). For each of the following types of samples, definitions and examples are provided, along with information on advantages and disadvantages: simple random sampling,…

  10. Study on the effects of sample selection on spectral reflectance reconstruction based on the algorithm of compressive sensing

    International Nuclear Information System (INIS)

    Zhang, Leihong; Liang, Dong

    2016-01-01

    In order to solve the problem that reconstruction efficiency and precision is not high, in this paper different samples are selected to reconstruct spectral reflectance, and a new kind of spectral reflectance reconstruction method based on the algorithm of compressive sensing is provided. Four different color numbers of matte color cards such as the ColorChecker Color Rendition Chart and Color Checker SG, the copperplate paper spot color card of Panton, and the Munsell colors card are chosen as training samples, the spectral image is reconstructed respectively by the algorithm of compressive sensing and pseudo-inverse and Wiener, and the results are compared. These methods of spectral reconstruction are evaluated by root mean square error and color difference accuracy. The experiments show that the cumulative contribution rate and color difference of the Munsell colors card are better than those of the other three numbers of color cards in the same conditions of reconstruction, and the accuracy of the spectral reconstruction will be affected by the training sample of different numbers of color cards. The key technology of reconstruction means that the uniformity and representation of the training sample selection has important significance upon reconstruction. In this paper, the influence of the sample selection on the spectral image reconstruction is studied. The precision of the spectral reconstruction based on the algorithm of compressive sensing is higher than that of the traditional algorithm of spectral reconstruction. By the MATLAB simulation results, it can be seen that the spectral reconstruction precision and efficiency are affected by the different color numbers of the training sample. (paper)

  11. The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.

    Science.gov (United States)

    Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S

    2016-10-01

    The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.

  12. MANAGERIAL DECISION IN INNOVATIVE EDUCATION SYSTEMS STATISTICAL SURVEY BASED ON SAMPLE THEORY

    Directory of Open Access Journals (Sweden)

    Gheorghe SĂVOIU

    2012-12-01

    Full Text Available Before formulating the statistical hypotheses and the econometrictesting itself, a breakdown of some of the technical issues is required, which are related to managerial decision in innovative educational systems, the educational managerial phenomenon tested through statistical and mathematical methods, respectively the significant difference in perceiving the current qualities, knowledge, experience, behaviour and desirable health, obtained through a questionnaire applied to a stratified population at the end,in the educational environment, either with educational activities, or with simultaneously managerial and educational activities. The details having to do with research focused on the survey theory, turning into a working tool the questionnaires and statistical data that are processed from those questionnaires, are summarized below.

  13. Quantitative analysis of emphysema and airway measurements according to iterative reconstruction algorithms: comparison of filtered back projection, adaptive statistical iterative reconstruction and model-based iterative reconstruction

    International Nuclear Information System (INIS)

    Choo, Ji Yung; Goo, Jin Mo; Park, Chang Min; Park, Sang Joon; Lee, Chang Hyun; Shim, Mi-Suk

    2014-01-01

    To evaluate filtered back projection (FBP) and two iterative reconstruction (IR) algorithms and their effects on the quantitative analysis of lung parenchyma and airway measurements on computed tomography (CT) images. Low-dose chest CT obtained in 281 adult patients were reconstructed using three algorithms: FBP, adaptive statistical IR (ASIR) and model-based IR (MBIR). Measurements of each dataset were compared: total lung volume, emphysema index (EI), airway measurements of the lumen and wall area as well as average wall thickness. Accuracy of airway measurements of each algorithm was also evaluated using an airway phantom. EI using a threshold of -950 HU was significantly different among the three algorithms in decreasing order of FBP (2.30 %), ASIR (1.49 %) and MBIR (1.20 %) (P < 0.01). Wall thickness was also significantly different among the three algorithms with FBP (2.09 mm) demonstrating thicker walls than ASIR (2.00 mm) and MBIR (1.88 mm) (P < 0.01). Airway phantom analysis revealed that MBIR showed the most accurate value for airway measurements. The three algorithms presented different EIs and wall thicknesses, decreasing in the order of FBP, ASIR and MBIR. Thus, care should be taken in selecting the appropriate IR algorithm on quantitative analysis of the lung. (orig.)

  14. Quantitative analysis of emphysema and airway measurements according to iterative reconstruction algorithms: comparison of filtered back projection, adaptive statistical iterative reconstruction and model-based iterative reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Choo, Ji Yung [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of); Korea University Ansan Hospital, Ansan-si, Department of Radiology, Gyeonggi-do (Korea, Republic of); Goo, Jin Mo; Park, Chang Min; Park, Sang Joon [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of); Seoul National University, Cancer Research Institute, Seoul (Korea, Republic of); Lee, Chang Hyun; Shim, Mi-Suk [Seoul National University Medical Research Center, Department of Radiology, Seoul National University College of Medicine, and Institute of Radiation Medicine, Seoul (Korea, Republic of)

    2014-04-15

    To evaluate filtered back projection (FBP) and two iterative reconstruction (IR) algorithms and their effects on the quantitative analysis of lung parenchyma and airway measurements on computed tomography (CT) images. Low-dose chest CT obtained in 281 adult patients were reconstructed using three algorithms: FBP, adaptive statistical IR (ASIR) and model-based IR (MBIR). Measurements of each dataset were compared: total lung volume, emphysema index (EI), airway measurements of the lumen and wall area as well as average wall thickness. Accuracy of airway measurements of each algorithm was also evaluated using an airway phantom. EI using a threshold of -950 HU was significantly different among the three algorithms in decreasing order of FBP (2.30 %), ASIR (1.49 %) and MBIR (1.20 %) (P < 0.01). Wall thickness was also significantly different among the three algorithms with FBP (2.09 mm) demonstrating thicker walls than ASIR (2.00 mm) and MBIR (1.88 mm) (P < 0.01). Airway phantom analysis revealed that MBIR showed the most accurate value for airway measurements. The three algorithms presented different EIs and wall thicknesses, decreasing in the order of FBP, ASIR and MBIR. Thus, care should be taken in selecting the appropriate IR algorithm on quantitative analysis of the lung. (orig.)

  15. A cost-saving statistically based screening technique for focused sampling of a lead-contaminated site

    International Nuclear Information System (INIS)

    Moscati, A.F. Jr.; Hediger, E.M.; Rupp, M.J.

    1986-01-01

    High concentrations of lead in soils along an abandoned railroad line prompted a remedial investigation to characterize the extent of contamination across a 7-acre site. Contamination was thought to be spotty across the site reflecting its past use in battery recycling operations at discrete locations. A screening technique was employed to delineate the more highly contaminated areas by testing a statistically determined minimum number of random samples from each of seven discrete site areas. The approach not only quickly identified those site areas which would require more extensive grid sampling, but also provided a statistically defensible basis for excluding other site areas from further consideration, thus saving the cost of additional sample collection and analysis. The reduction in the number of samples collected in ''clean'' areas of the site ranged from 45 to 60%

  16. Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples

    DEFF Research Database (Denmark)

    Hansen, Kasper Lage; Szallasi, Zoltan Imre; Eklund, Aron Charles

    2009-01-01

    evaluated consistency using the Pearson correlation between measurements obtained on the two platforms. Also, we introduce the log-ratio discrepancy as a more relevant measure of discordance between gene expression platforms. Of nine preprocessing algorithms tested, PLIER+16 produced expression values...

  17. Statistical inference for discrete-time samples from affine stochastic delay differential equations

    DEFF Research Database (Denmark)

    Küchler, Uwe; Sørensen, Michael

    2013-01-01

    Statistical inference for discrete time observations of an affine stochastic delay differential equation is considered. The main focus is on maximum pseudo-likelihood estimators, which are easy to calculate in practice. A more general class of prediction-based estimating functions is investigated...

  18. Using student models to generate feedback in a university course on statistical sampling

    NARCIS (Netherlands)

    Tacoma, S.G.|info:eu-repo/dai/nl/411923080; Drijvers, P.H.M.|info:eu-repo/dai/nl/074302922; Boon, P.B.J.|info:eu-repo/dai/nl/203374207

    2017-01-01

    Due to the complexity of the topic and a lack of individual guidance, introductory statistics courses at university are often challenging. Automated feedback might help to address this issue. In this study, we explore the use of student models to provide feedback. The research question is how

  19. Constrained statistical inference : sample-size tables for ANOVA and regression

    NARCIS (Netherlands)

    Vanbrabant, Leonard; Van De Schoot, Rens; Rosseel, Yves

    2015-01-01

    Researchers in the social and behavioral sciences often have clear expectations about the order/direction of the parameters in their statistical model. For example, a researcher might expect that regression coefficient β1 is larger than β2 and β3. The corresponding hypothesis is H: β1 > {β2, β3} and

  20. Spatial scan statistics to assess sampling strategy of antimicrobial resistance monitoring programme

    DEFF Research Database (Denmark)

    Vieira, Antonio; Houe, Hans; Wegener, Henrik Caspar

    2009-01-01

    Pie collection and analysis of data on antimicrobial resistance in human and animal Populations are important for establishing a baseline of the occurrence of resistance and for determining trends over time. In animals, targeted monitoring with a stratified sampling plan is normally used. However...... sampled by the Danish Integrated Antimicrobial Resistance Monitoring and Research Programme (DANMAP), by identifying spatial Clusters of samples and detecting areas with significantly high or low sampling rates. These analyses were performed for each year and for the total 5-year study period for all...... by an antimicrobial monitoring program....

  1. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    Science.gov (United States)

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  2. New complete sample of identified radio sources. Part 2. Statistical study

    International Nuclear Information System (INIS)

    Soltan, A.

    1978-01-01

    Complete sample of radio sources with known redshifts selected in Paper I is studied. Source counts in the sample and the luminosity - volume test show that both quasars and galaxies are subject to the evolution. Luminosity functions for different ranges of redshifts are obtained. Due to many uncertainties only simplified models of the evolution are tested. Exponential decline of the liminosity with time of all the bright sources is in a good agreement both with the luminosity- volume test and N(S) realtion in the entire range of observed flux densities. It is shown that sources in the sample are randomly distributed in scales greater than about 17 Mpc. (author)

  3. A robust statistical estimation (RoSE) algorithm jointly recovers the 3D location and intensity of single molecules accurately and precisely

    Science.gov (United States)

    Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.

    2018-02-01

    In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.

  4. Numerically stable algorithm for combining census and sample estimates with the multivariate composite estimator

    Science.gov (United States)

    R. L. Czaplewski

    2009-01-01

    The minimum variance multivariate composite estimator is a relatively simple sequential estimator for complex sampling designs (Czaplewski 2009). Such designs combine a probability sample of expensive field data with multiple censuses and/or samples of relatively inexpensive multi-sensor, multi-resolution remotely sensed data. Unfortunately, the multivariate composite...

  5. TRAN-STAT: statistics for environmental studies, Number 22. Comparison of soil-sampling techniques for plutonium at Rocky Flats

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Bernhardt, D.E.; Hahn, P.B.

    1983-01-01

    A summary of a field soil sampling study conducted around the Rocky Flats Colorado plant in May 1977 is preseted. Several different soil sampling techniques that had been used in the area were applied at four different sites. One objective was to comparethe average 239 - 240 Pu concentration values obtained by the various soil sampling techniques used. There was also interest in determining whether there are differences in the reproducibility of the various techniques and how the techniques compared with the proposed EPA technique of sampling to 1 cm depth. Statistically significant differences in average concentrations between the techniques were found. The differences could be largely related to the differences in sampling depth-the primary physical variable between the techniques. The reproducibility of the techniques was evaluated by comparing coefficients of variation. Differences between coefficients of variation were not statistically significant. Average (median) coefficients ranged from 21 to 42 percent for the five sampling techniques. A laboratory study indicated that various sample treatment and particle sizing techniques could increase the concentration of plutonium in the less than 10 micrometer size fraction by up to a factor of about 4 compared to the 2 mm size fraction

  6. Learning maximum entropy models from finite-size data sets: A fast data-driven algorithm allows sampling from the posterior distribution.

    Science.gov (United States)

    Ferrari, Ulisse

    2016-08-01

    Maximum entropy models provide the least constrained probability distributions that reproduce statistical properties of experimental datasets. In this work we characterize the learning dynamics that maximizes the log-likelihood in the case of large but finite datasets. We first show how the steepest descent dynamics is not optimal as it is slowed down by the inhomogeneous curvature of the model parameters' space. We then provide a way for rectifying this space which relies only on dataset properties and does not require large computational efforts. We conclude by solving the long-time limit of the parameters' dynamics including the randomness generated by the systematic use of Gibbs sampling. In this stochastic framework, rather than converging to a fixed point, the dynamics reaches a stationary distribution, which for the rectified dynamics reproduces the posterior distribution of the parameters. We sum up all these insights in a "rectified" data-driven algorithm that is fast and by sampling from the parameters' posterior avoids both under- and overfitting along all the directions of the parameters' space. Through the learning of pairwise Ising models from the recording of a large population of retina neurons, we show how our algorithm outperforms the steepest descent method.

  7. [Statistical study of the incidence of agenesis in a sample of 1529 subjects].

    Science.gov (United States)

    Lo Muzio, L; Mignogna, M D; Bucci, P; Sorrentino, F

    1989-09-01

    Following a short review of the main aetiopathogenetic theories on dental agenesia, a personal statistical study of this pathology is reported. 1529 orthopantomographs of juveniles aged between 7 and 14 were examined. 79 cases of hypodentia were observed (5.2%), 32 in males (4.05%) and 47 in females (6.78%). The most interesting tooth was the second premolar with an incidence of 58.9% followed by the lateral incisor, with an incidence of 26.38%. This is in agreement with the international literature.

  8. The Monte Carlo method as a tool for statistical characterisation of differential and additive phase shifting algorithms

    International Nuclear Information System (INIS)

    Miranda, M; Dorrio, B V; Blanco, J; Diz-Bugarin, J; Ribas, F

    2011-01-01

    Several metrological applications base their measurement principle in the phase sum or difference between two patterns, one original s(r,φ) and another modified t(r,φ+Δφ). Additive or differential phase shifting algorithms directly recover the sum 2φ+Δφ or the difference Δφ of phases without requiring prior calculation of the individual phases. These algorithms can be constructed, for example, from a suitable combination of known phase shifting algorithms. Little has been written on the design, analysis and error compensation of these new two-stage algorithms. Previously we have used computer simulation to study, in a linear approach or with a filter process in reciprocal space, the response of several families of them to the main error sources. In this work we present an error analysis that uses Monte Carlo simulation to achieve results in good agreement with those obtained with spatial and temporal methods.

  9. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    Science.gov (United States)

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  10. Statistical power to detect genetic (covariance of complex traits using SNP data in unrelated samples.

    Directory of Open Access Journals (Sweden)

    Peter M Visscher

    2014-04-01

    Full Text Available We have recently developed analysis methods (GREML to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (covariation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases in particular when the traits (diseases are not measured on the same samples.

  11. Density meter algorithm and system for estimating sampling/mixing uncertainty

    International Nuclear Information System (INIS)

    Shine, E.P.

    1986-01-01

    The Laboratories Department at the Savannah River Plant (SRP) has installed a six-place density meter with an automatic sampling device. This paper describes the statisical software developed to analyze the density of uranyl nitrate solutions using this automated system. The purpose of this software is twofold: to estimate the sampling/mixing and measurement uncertainties in the process and to provide a measurement control program for the density meter. Non-uniformities in density are analyzed both analytically and graphically. The mean density and its limit of error are estimated. Quality control standards are analyzed concurrently with process samples and used to control the density meter measurement error. The analyses are corrected for concentration due to evaporation of samples waiting to be analyzed. The results of this program have been successful in identifying sampling/mixing problems and controlling the quality of analyses

  12. Statistical Methods and Sampling Design for Estimating Step Trends in Surface-Water Quality

    Science.gov (United States)

    Hirsch, Robert M.

    1988-01-01

    This paper addresses two components of the problem of estimating the magnitude of step trends in surface water quality. The first is finding a robust estimator appropriate to the data characteristics expected in water-quality time series. The J. L. Hodges-E. L. Lehmann class of estimators is found to be robust in comparison to other nonparametric and moment-based estimators. A seasonal Hodges-Lehmann estimator is developed and shown to have desirable properties. Second, the effectiveness of various sampling strategies is examined using Monte Carlo simulation coupled with application of this estimator. The simulation is based on a large set of total phosphorus data from the Potomac River. To assure that the simulated records have realistic properties, the data are modeled in a multiplicative fashion incorporating flow, hysteresis, seasonal, and noise components. The results demonstrate the importance of balancing the length of the two sampling periods and balancing the number of data values between the two periods.

  13. Statistical model for degraded DNA samples and adjusted probabilities for allelic drop-out

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Mogensen, Helle Smidt

    2012-01-01

    Abstract DNA samples found at a scene of crime or obtained from the debris of a mass disaster accident are often subject to degradation. When using the STR DNA technology, the DNA profile is observed via a so-called electropherogram (EPG), where the alleles are identified as signal peaks above...... data from degraded DNA, where cases with varying amounts of DNA and levels of degradation are investigated....

  14. Statistical model for degraded DNA samples and adjusted probabilities for allelic drop-out

    DEFF Research Database (Denmark)

    Tvedebrink, Torben; Eriksen, Poul Svante; Mogensen, Helle Smidt

    2012-01-01

    DNA samples found at a scene of crime or obtained from the debris of a mass disaster accident are often subject to degradation. When using the STR DNA technology, the DNA profile is observed via a so-called electropherogram (EPG), where the alleles are identified as signal peaks above a certain...... data from degraded DNA, where cases with varying amounts of DNA and levels of degradation are investigated....

  15. Statistical inference for the additive hazards model under outcome-dependent sampling.

    Science.gov (United States)

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  16. Relationship between accuracy and number of samples on statistical quantity and contour map of environmental gamma-ray dose rate. Example of random sampling

    International Nuclear Information System (INIS)

    Matsuda, Hideharu; Minato, Susumu

    2002-01-01

    The accuracy of statistical quantity like the mean value and contour map obtained by measurement of the environmental gamma-ray dose rate was evaluated by random sampling of 5 different model distribution maps made by the mean slope, -1.3, of power spectra calculated from the actually measured values. The values were derived from 58 natural gamma dose rate data reported worldwide ranging in the means of 10-100 Gy/h rates and 10 -3 -10 7 km 2 areas. The accuracy of the mean value was found around ±7% even for 60 or 80 samplings (the most frequent number) and the standard deviation had the accuracy less than 1/4-1/3 of the means. The correlation coefficient of the frequency distribution was found 0.860 or more for 200-400 samplings (the most frequent number) but of the contour map, 0.502-0.770. (K.H.)

  17. Optimal design of sampling and mapping schemes in the radiometric exploration of Chipilapa, El Salvador (Geo-statistics)

    International Nuclear Information System (INIS)

    Balcazar G, M.; Flores R, J.H.

    1992-01-01

    As part of the knowledge about the radiometric surface exploration, carried out in the geothermal field of Chipilapa, El Salvador, its were considered the geo-statistical parameters starting from the calculated variogram of the field data, being that the maxim distance of correlation of the samples in 'radon' in the different observation addresses (N-S, E-W, N W-S E, N E-S W), it was of 121 mts for the monitoring grill in future prospectus in the same area. Being derived of it an optimization (minimum cost) in the spacing of the field samples by means of geo-statistical techniques, without losing the detection of the anomaly. (Author)

  18. Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

    Science.gov (United States)

    Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

    2015-03-01

    FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Statistical properties of the surface velocity field in the northern Gulf of Mexico sampled by GLAD drifters

    OpenAIRE

    Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P

    2016-01-01

    The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the velocity field and its statistical parameters are non-Gaussian; most are multimodal. The dominant periods for the surface velocity field are 1–2 days due to inertial oscillations, tides, and the sea b...

  20. An empirical test of Maslow's theory of need hierarchy using hologeistic comparison by statistical sampling.

    Science.gov (United States)

    Davis-Sharts, J

    1986-10-01

    Maslow's hierarchy of basic human needs provides a major theoretical framework in nursing science. The purpose of this study was to empirically test Maslow's need theory, specifically at the levels of physiological and security needs, using a hologeistic comparative method. Thirty cultures taken from the 60 cultural units in the Health Relations Area Files (HRAF) Probability Sample were found to have data available for examining hypotheses about thermoregulatory (physiological) and protective (security) behaviors practiced prior to sleep onset. The findings demonstrate there is initial worldwide empirical evidence to support Maslow's need hierarchy.

  1. Imaging systems and algorithms to analyze biological samples in real-time using mobile phone microscopy.

    Science.gov (United States)

    Shanmugam, Akshaya; Usmani, Mohammad; Mayberry, Addison; Perkins, David L; Holcomb, Daniel E

    2018-01-01

    Miniaturized imaging devices have pushed the boundaries of point-of-care imaging, but existing mobile-phone-based imaging systems do not exploit the full potential of smart phones. This work demonstrates the use of simple imaging configurations to deliver superior image quality and the ability to handle a wide range of biological samples. Results presented in this work are from analysis of fluorescent beads under fluorescence imaging, as well as helminth eggs and freshwater mussel larvae under white light imaging. To demonstrate versatility of the systems, real time analysis and post-processing results of the sample count and sample size are presented in both still images and videos of flowing samples.

  2. Adaptive sampling based on the cumulative distribution function of order statistics to delineate heavy-metal contaminated soils using kriging

    International Nuclear Information System (INIS)

    Juang, K.-W.; Lee, D.-Y.; Teng, Y.-L.

    2005-01-01

    Correctly classifying 'contaminated' areas in soils, based on the threshold for a contaminated site, is important for determining effective clean-up actions. Pollutant mapping by means of kriging is increasingly being used for the delineation of contaminated soils. However, those areas where the kriged pollutant concentrations are close to the threshold have a high possibility for being misclassified. In order to reduce the misclassification due to the over- or under-estimation from kriging, an adaptive sampling using the cumulative distribution function of order statistics (CDFOS) was developed to draw additional samples for delineating contaminated soils, while kriging. A heavy-metal contaminated site in Hsinchu, Taiwan was used to illustrate this approach. The results showed that compared with random sampling, adaptive sampling using CDFOS reduced the kriging estimation errors and misclassification rates, and thus would appear to be a better choice than random sampling, as additional sampling is required for delineating the 'contaminated' areas. - A sampling approach was derived for drawing additional samples while kriging

  3. Statistical theory for estimating sampling errors of regional radiation averages based on satellite measurements

    Science.gov (United States)

    Smith, G. L.; Bess, T. D.; Minnis, P.

    1983-01-01

    The processes which determine the weather and climate are driven by the radiation received by the earth and the radiation subsequently emitted. A knowledge of the absorbed and emitted components of radiation is thus fundamental for the study of these processes. In connection with the desire to improve the quality of long-range forecasting, NASA is developing the Earth Radiation Budget Experiment (ERBE), consisting of a three-channel scanning radiometer and a package of nonscanning radiometers. A set of these instruments is to be flown on both the NOAA-F and NOAA-G spacecraft, in sun-synchronous orbits, and on an Earth Radiation Budget Satellite. The purpose of the scanning radiometer is to obtain measurements from which the average reflected solar radiant exitance and the average earth-emitted radiant exitance at a reference level can be established. The estimate of regional average exitance obtained will not exactly equal the true value of the regional average exitance, but will differ due to spatial sampling. A method is presented for evaluating this spatial sampling error.

  4. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

    Directory of Open Access Journals (Sweden)

    Nezar Noor Al-Hebshi

    2015-09-01

    Full Text Available Background: Usefulness of next-generation sequencing (NGS in assessing bacteria associated with oral squamous cell carcinoma (OSCC has been undermined by inability to classify reads to the species level. Objective: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues. Methods: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD, HOMD extended (HOMDEXT, and Greengene Gold (GGG at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis. Results: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples. Conclusion: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the

  5. A pilot study using low-dose Spectral CT and ASIR (Adaptive Statistical Iterative Reconstruction) algorithm to diagnose solitary pulmonary nodules.

    Science.gov (United States)

    Xiao, Huijuan; Liu, Yihe; Tan, Hongna; Liang, Pan; Wang, Bo; Su, Lei; Wang, Suya; Gao, Jianbo

    2015-11-17

    Lung cancer is the most common cancer which has the highest mortality rate. With the development of computed tomography (CT) techniques, the case detection rates of solitary pulmonary nodules (SPN) has constantly increased and the diagnosis accuracy of SPN has remained a hot topic in clinical and imaging diagnosis. The aim of this study was to evaluate the combination of low-dose spectral CT and ASIR (Adaptive Statistical Iterative Reconstruction) algorithm in the diagnosis of solitary pulmonary nodules (SPN). 62 patients with SPN (42 cases of benign SPN and 20 cases of malignant SPN, pathology confirmed) were scanned by spectral CT with a dual-phase contrast-enhanced method. The iodine and water concentration (IC and WC) of the lesion and the artery in the image that had the same density were measured by the GSI (Gemstone Spectral Imaging) software. The normalized iodine and water concentration (NIC and NWC) of the lesion and the normalized iodine and water concentration difference (ICD and WCD) between the arterial and venous phases (AP and VP) were also calculated. The spectral HU (Hounsfield Unit ) curve was divided into 3 sections based on the energy (40-70, 70-100 and 100-140 keV) and the slopes (λHU) in both phases were calculated. The ICAP, ICVP, WCAP and WCVP, NIC and NWC, and the λHU in benign and malignant SPN were compared by independent sample t-test. The iodine related parameters (ICAP, ICVP, NICAP, NICVP, and the ICD) of malignant SPN were significantly higher than that of benign SPN (t = 3.310, 1.330, 2.388, 1.669 and 3.251, respectively, P 0.05). The iodine related parameters and the slope of spectral curve are useful markers to distinguish the benign from the malignant lung diseases, and its application is extremely feasible in clinical applications.

  6. Statistical inferences with jointly type-II censored samples from two Pareto distributions

    Science.gov (United States)

    Abu-Zinadah, Hanaa H.

    2017-08-01

    In the several fields of industries the product comes from more than one production line, which is required to work the comparative life tests. This problem requires sampling of the different production lines, then the joint censoring scheme is appeared. In this article we consider the life time Pareto distribution with jointly type-II censoring scheme. The maximum likelihood estimators (MLE) and the corresponding approximate confidence intervals as well as the bootstrap confidence intervals of the model parameters are obtained. Also Bayesian point and credible intervals of the model parameters are presented. The life time data set is analyzed for illustrative purposes. Monte Carlo results from simulation studies are presented to assess the performance of our proposed method.

  7. Supplementary Material for: Compressing an Ensemble With Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

    KAUST Repository

    Castruccio, Stefano; Genton, Marc G.

    2016-01-01

    algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific datasets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a

  8. Development of tomographic reconstruction algorithms for the PIXE analysis of biological samples

    International Nuclear Information System (INIS)

    Nguyen, D.T.

    2008-05-01

    The development of 3-dimensional microscopy techniques offering a spatial resolution of 1 μm or less has opened a large field of investigation in Cell Biology. Amongst them, an interesting advantage of ion beam micro-tomography is its ability to give quantitative results in terms of local concentrations in a direct way, using Particle Induced X-ray Emission (PIXET) combined to Scanning Transmission Ion Microscopy (STIMT) Tomography. After a brief introduction of existing reconstruction techniques, we present the principle of the DISRA code, the most complete written so far, which is the basis of the present work. We have modified and extended the DISRA algorithm by considering the specific aspects of biologic specimens. Moreover, correction procedures were added in the code to reduce noise in the tomograms. For portability purpose, a Windows graphic interface was designed to easily enter and modify experimental parameters used in the reconstruction, and control the several steps of data reduction. Results of STIMT and PIXET experiments on reference specimens and on human cancer cells will be also presented. (author)

  9. Algorithm/Architecture Co-design of the Generalized Sampling Theorem Based De-Interlacer.

    NARCIS (Netherlands)

    Beric, A.; Haan, de G.; Sethuraman, R.; Meerbergen, van J.

    2005-01-01

    De-interlacing is a major determinant of image quality in a modern display processing chain. The de-interlacing method based on the generalized sampling theorem (GST)applied to motion estimation and motion compensation provides the best de-interlacing results. With HDTV interlaced input material

  10. Statistical Analysis of CO2 Exposed Wells to Predict Long Term Leakage through the Development of an Integrated Neural-Genetic Algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Guo, Boyun [Univ. of Louisiana, Lafayette, LA (United States); Duguid, Andrew [Battelle, Columbus, OH (United States); Nygaard, Ronar [Missouri Univ. of Science and Technology, Rolla, MO (United States)

    2017-08-05

    The objective of this project is to develop a computerized statistical model with the Integrated Neural-Genetic Algorithm (INGA) for predicting the probability of long-term leak of wells in CO2 sequestration operations. This object has been accomplished by conducting research in three phases: 1) data mining of CO2-explosed wells, 2) INGA computer model development, and 3) evaluation of the predictive performance of the computer model with data from field tests. Data mining was conducted for 510 wells in two CO2 sequestration projects in the Texas Gulf Coast region. They are the Hasting West field and Oyster Bayou field in the Southern Texas. Missing wellbore integrity data were estimated using an analytical and Finite Element Method (FEM) model. The INGA was first tested for performances of convergence and computing efficiency with the obtained data set of high dimension. It was concluded that the INGA can handle the gathered data set with good accuracy and reasonable computing time after a reduction of dimension with a grouping mechanism. A computerized statistical model with the INGA was then developed based on data pre-processing and grouping. Comprehensive training and testing of the model were carried out to ensure that the model is accurate and efficient enough for predicting the probability of long-term leak of wells in CO2 sequestration operations. The Cranfield in the southern Mississippi was select as the test site. Observation wells CFU31F2 and CFU31F3 were used for pressure-testing, formation-logging, and cement-sampling. Tools run in the wells include Isolation Scanner, Slim Cement Mapping Tool (SCMT), Cased Hole Formation Dynamics Tester (CHDT), and Mechanical Sidewall Coring Tool (MSCT). Analyses of the obtained data indicate no leak of CO2 cross the cap zone while it is evident that the well cement sheath was invaded by the CO2 from the storage zone. This observation is consistent

  11. Inferring microRNA regulation of mRNA with partially ordered samples of paired expression data and exogenous prediction algorithms.

    Directory of Open Access Journals (Sweden)

    Brian Godsey

    Full Text Available MicroRNAs (miRs are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging, there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.

  12. An improved Bayesian tensor regularization and sampling algorithm to track neuronal fiber pathways in the language circuit.

    Science.gov (United States)

    Mishra, Arabinda; Anderson, Adam W; Wu, Xi; Gore, John C; Ding, Zhaohua

    2010-08-01

    The purpose of this work is to design a neuronal fiber tracking algorithm, which will be more suitable for reconstruction of fibers associated with functionally important regions in the human brain. The functional activations in the brain normally occur in the gray matter regions. Hence the fibers bordering these regions are weakly myelinated, resulting in poor performance of conventional tractography methods to trace the fiber links between them. A lower fractional anisotropy in this region makes it even difficult to track the fibers in the presence of noise. In this work, the authors focused on a stochastic approach to reconstruct these fiber pathways based on a Bayesian regularization framework. To estimate the true fiber direction (propagation vector), the a priori and conditional probability density functions are calculated in advance and are modeled as multivariate normal. The variance of the estimated tensor element vector is associated with the uncertainty due to noise and partial volume averaging (PVA). An adaptive and multiple sampling of the estimated tensor element vector, which is a function of the pre-estimated variance, overcomes the effect of noise and PVA in this work. The algorithm has been rigorously tested using a variety of synthetic data sets. The quantitative comparison of the results to standard algorithms motivated the authors to implement it for in vivo DTI data analysis. The algorithm has been implemented to delineate fibers in two major language pathways (Broca's to SMA and Broca's to Wernicke's) across 12 healthy subjects. Though the mean of standard deviation was marginally bigger than conventional (Euler's) approach [P. J. Basser et al., "In vivo fiber tractography using DT-MRI data," Magn. Reson. Med. 44(4), 625-632 (2000)], the number of extracted fibers in this approach was significantly higher. The authors also compared the performance of the proposed method to Lu's method [Y. Lu et al., "Improved fiber tractography with Bayesian

  13. Some statistical and sampling needs for detecting spills or migration at commercial low-level radioactive waste disposal sites

    International Nuclear Information System (INIS)

    Thomas, J.M.; Eberhardt, L.L.; Skalski, J.R.; Simmons, M.A.

    1984-05-01

    As part of a larger study funded by the US Nuclear Regulatory Commission we have been investigating field sampling strategies and compositing as a means of detecting spills or migration at commercial low-level radioactive waste disposal sites. The overall project is designed to produce information for developing guidance on implementing 10 CFR part 61. Compositing (pooling samples) for detection is discussed first, followed by our development of a statistical test to allow a decision as to whether any component of a composite exceeds a prescribed maximum acceptable level. The question of optimal field sampling designs and an Apple computer program designed to show the difficulties in constructing efficient field designs and using compositing schemes are considered. 6 references, 3 figures, 3 tables

  14. Tandem mass spectrometry of human tryptic blood peptides calculated by a statistical algorithm and captured by a relational database with exploration by a general statistical analysis system.

    Science.gov (United States)

    Bowden, Peter; Beavis, Ron; Marshall, John

    2009-11-02

    A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.

  15. Statistics-based optimization of the polarimetric radar hydrometeor classification algorithm and its application for a squall line in South China

    Science.gov (United States)

    Wu, Chong; Liu, Liping; Wei, Ming; Xi, Baozhu; Yu, Minghui

    2018-03-01

    A modified hydrometeor classification algorithm (HCA) is developed in this study for Chinese polarimetric radars. This algorithm is based on the U.S. operational HCA. Meanwhile, the methodology of statistics-based optimization is proposed including calibration checking, datasets selection, membership functions modification, computation thresholds modification, and effect verification. Zhuhai radar, the first operational polarimetric radar in South China, applies these procedures. The systematic bias of calibration is corrected, the reliability of radar measurements deteriorates when the signal-to-noise ratio is low, and correlation coefficient within the melting layer is usually lower than that of the U.S. WSR-88D radar. Through modification based on statistical analysis of polarimetric variables, the localized HCA especially for Zhuhai is obtained, and it performs well over a one-month test through comparison with sounding and surface observations. The algorithm is then utilized for analysis of a squall line process on 11 May 2014 and is found to provide reasonable details with respect to horizontal and vertical structures, and the HCA results—especially in the mixed rain-hail region—can reflect the life cycle of the squall line. In addition, the kinematic and microphysical processes of cloud evolution and the differences between radar-detected hail and surface observations are also analyzed. The results of this study provide evidence for the improvement of this HCA developed specifically for China.

  16. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    Science.gov (United States)

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  17. A Smoothing Algorithm for a New Two-Stage Stochastic Model of Supply Chain Based on Sample Average Approximation

    Directory of Open Access Journals (Sweden)

    Liu Yang

    2017-01-01

    Full Text Available We construct a new two-stage stochastic model of supply chain with multiple factories and distributors for perishable product. By introducing a second-order stochastic dominance (SSD constraint, we can describe the preference consistency of the risk taker while minimizing the expected cost of company. To solve this problem, we convert it into a one-stage stochastic model equivalently; then we use sample average approximation (SAA method to approximate the expected values of the underlying random functions. A smoothing approach is proposed with which we can get the global solution and avoid introducing new variables and constraints. Meanwhile, we investigate the convergence of an optimal value from solving the transformed model and show that, with probability approaching one at exponential rate, the optimal value converges to its counterpart as the sample size increases. Numerical results show the effectiveness of the proposed algorithm and analysis.

  18. GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms.

    Science.gov (United States)

    Kobayashi, Chigusa; Jung, Jaewoon; Matsunaga, Yasuhiro; Mori, Takaharu; Ando, Tadashi; Tamura, Koichi; Kamiya, Motoshi; Sugita, Yuji

    2017-09-30

    GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  19. Classification of amyotrophic lateral sclerosis disease based on convolutional neural network and reinforcement sample learning algorithm.

    Science.gov (United States)

    Sengur, Abdulkadir; Akbulut, Yaman; Guo, Yanhui; Bajaj, Varun

    2017-12-01

    Electromyogram (EMG) signals contain useful information of the neuromuscular diseases like amyotrophic lateral sclerosis (ALS). ALS is a well-known brain disease, which can progressively degenerate the motor neurons. In this paper, we propose a deep learning based method for efficient classification of ALS and normal EMG signals. Spectrogram, continuous wavelet transform (CWT), and smoothed pseudo Wigner-Ville distribution (SPWVD) have been employed for time-frequency (T-F) representation of EMG signals. A convolutional neural network is employed to classify these features. In it, Two convolution layers, two pooling layer, a fully connected layer and a lost function layer is considered in CNN architecture. The CNN architecture is trained with the reinforcement sample learning strategy. The efficiency of the proposed implementation is tested on publicly available EMG dataset. The dataset contains 89 ALS and 133 normal EMG signals with 24 kHz sampling frequency. Experimental results show 96.80% accuracy. The obtained results are also compared with other methods, which show the superiority of the proposed method.

  20. The Sloan Digital Sky Survey Quasar Lens Search. IV. Statistical Lens Sample from the Fifth Data Release

    Energy Technology Data Exchange (ETDEWEB)

    Inada, Naohisa; /Wako, RIKEN /Tokyo U., ICEPP; Oguri, Masamune; /Natl. Astron. Observ. of Japan /Stanford U., Phys. Dept.; Shin, Min-Su; /Michigan U. /Princeton U. Observ.; Kayo, Issha; /Tokyo U., ICRR; Strauss, Michael A.; /Princeton U. Observ.; Hennawi, Joseph F.; /UC, Berkeley /Heidelberg, Max Planck Inst. Astron.; Morokuma, Tomoki; /Natl. Astron. Observ. of Japan; Becker, Robert H.; /LLNL, Livermore /UC, Davis; White, Richard L.; /Baltimore, Space Telescope Sci.; Kochanek, Christopher S.; /Ohio State U.; Gregg, Michael D.; /LLNL, Livermore /UC, Davis /Exeter U.

    2010-05-01

    We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i < 19.1 in the redshift range 0.6 < z < 2.2, where we require the lenses to have image separations of 1 < {theta} < 20 and i-band magnitude differences between the two images smaller than 1.25 mag. Among the 19 lensed quasars, 3 have quadruple-image configurations, while the remaining 16 show double images. This lens sample constrains the cosmological constant to be {Omega}{sub {Lambda}} = 0.84{sub -0.08}{sup +0.06}(stat.){sub -0.07}{sup + 0.09}(syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of 7 binary quasars with separations ranging from 1.1 to 16.6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.

  1. Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

    Science.gov (United States)

    Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

    2016-02-01

    The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders

  2. Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

    Science.gov (United States)

    Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

    2011-01-01

    The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.

  3. A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size.

    Directory of Open Access Journals (Sweden)

    Daniel Vasiliu

    Full Text Available Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED. Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

  4. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    Science.gov (United States)

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  5. Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

    Science.gov (United States)

    ten Veldhuis, Marie-Claire; Schleiss, Marc

    2017-04-01

    Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.

  6. Computed Tomography Image Quality Evaluation of a New Iterative Reconstruction Algorithm in the Abdomen (Adaptive Statistical Iterative Reconstruction-V) a Comparison With Model-Based Iterative Reconstruction, Adaptive Statistical Iterative Reconstruction, and Filtered Back Projection Reconstructions.

    Science.gov (United States)

    Goodenberger, Martin H; Wagner-Bartak, Nicolaus A; Gupta, Shiva; Liu, Xinming; Yap, Ramon Q; Sun, Jia; Tamm, Eric P; Jensen, Corey T

    The purpose of this study was to compare abdominopelvic computed tomography images reconstructed with adaptive statistical iterative reconstruction-V (ASIR-V) with model-based iterative reconstruction (Veo 3.0), ASIR, and filtered back projection (FBP). Abdominopelvic computed tomography scans for 36 patients (26 males and 10 females) were reconstructed using FBP, ASIR (80%), Veo 3.0, and ASIR-V (30%, 60%, 90%). Mean ± SD patient age was 32 ± 10 years with mean ± SD body mass index of 26.9 ± 4.4 kg/m. Images were reviewed by 2 independent readers in a blinded, randomized fashion. Hounsfield unit, noise, and contrast-to-noise ratio (CNR) values were calculated for each reconstruction algorithm for further comparison. Phantom evaluation of low-contrast detectability (LCD) and high-contrast resolution was performed. Adaptive statistical iterative reconstruction-V 30%, ASIR-V 60%, and ASIR 80% were generally superior qualitatively compared with ASIR-V 90%, Veo 3.0, and FBP (P ASIR-V 60% with respective CNR values of 5.54 ± 2.39, 8.78 ± 3.15, and 3.49 ± 1.77 (P ASIR 80% had the best and worst spatial resolution, respectively. Adaptive statistical iterative reconstruction-V 30% and ASIR-V 60% provided the best combination of qualitative and quantitative performance. Adaptive statistical iterative reconstruction 80% was equivalent qualitatively, but demonstrated inferior spatial resolution and LCD.

  7. THE SLOAN DIGITAL SKY SURVEY QUASAR LENS SEARCH. IV. STATISTICAL LENS SAMPLE FROM THE FIFTH DATA RELEASE

    International Nuclear Information System (INIS)

    Inada, Naohisa; Oguri, Masamune; Shin, Min-Su; Kayo, Issha; Fukugita, Masataka; Strauss, Michael A.; Gott, J. Richard; Hennawi, Joseph F.; Morokuma, Tomoki; Becker, Robert H.; Gregg, Michael D.; White, Richard L.; Kochanek, Christopher S.; Chiu, Kuenley; Johnston, David E.; Clocchiatti, Alejandro; Richards, Gordon T.; Schneider, Donald P.; Frieman, Joshua A.

    2010-01-01

    We present the second report of our systematic search for strongly lensed quasars from the data of the Sloan Digital Sky Survey (SDSS). From extensive follow-up observations of 136 candidate objects, we find 36 lenses in the full sample of 77,429 spectroscopically confirmed quasars in the SDSS Data Release 5. We then define a complete sample of 19 lenses, including 11 from our previous search in the SDSS Data Release 3, from the sample of 36,287 quasars with i Λ = 0.84 +0.06 -0.08 (stat.) +0.09 -0.07 (syst.) assuming a flat universe, which is in good agreement with other cosmological observations. We also report the discoveries of seven binary quasars with separations ranging from 1.''1 to 16.''6, which are identified in the course of our lens survey. This study concludes the construction of our statistical lens sample in the full SDSS-I data set.

  8. Testing University Rankings Statistically: Why this Perhaps is not such a Good Idea after All. Some Reflections on Statistical Power, Effect Size, Random Sampling and Imaginary Populations

    DEFF Research Database (Denmark)

    Schneider, Jesper Wiborg

    2012-01-01

    In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Rankin...

  9. An Unsupervised Method of Change Detection in Multi-Temporal PolSAR Data Using a Test Statistic and an Improved K&I Algorithm

    Directory of Open Access Journals (Sweden)

    Jinqi Zhao

    2017-12-01

    Full Text Available In recent years, multi-temporal imagery from spaceborne sensors has provided a fast and practical means for surveying and assessing changes in terrain surfaces. Owing to the all-weather imaging capability, polarimetric synthetic aperture radar (PolSAR has become a key tool for change detection. Change detection methods include both unsupervised and supervised methods. Supervised change detection, which needs some human intervention, is generally ineffective and impractical. Due to this limitation, unsupervised methods are widely used in change detection. The traditional unsupervised methods only use a part of the polarization information, and the required thresholding algorithms are independent of the multi-temporal data, which results in the change detection map being ineffective and inaccurate. To solve these problems, a novel method of change detection using a test statistic based on the likelihood ratio test and the improved Kittler and Illingworth (K&I minimum-error thresholding algorithm is introduced in this paper. The test statistic is used to generate the comparison image (CI of the multi-temporal PolSAR images, and improved K&I using a generalized Gaussian model simulates the distribution of the CI. As a result of these advantages, we can obtain the change detection map using an optimum threshold. The efficiency of the proposed method is demonstrated by the use of multi-temporal PolSAR images acquired by RADARSAT-2 over Wuhan, China. The experimental results show that the proposed method is effective and highly accurate.

  10. Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

    Directory of Open Access Journals (Sweden)

    Q. Zhang

    2018-02-01

    Full Text Available River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1 fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2 the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling – in the form of spectral slope (β or other equivalent scaling parameters (e.g., Hurst exponent – are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1 they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β  =  0 to Brown noise (β  =  2 and (2 their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb–Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among

  11. Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

    Science.gov (United States)

    Zhang, Qian; Harman, Ciaran J.; Kirchner, James W.

    2018-02-01

    River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling - in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) - are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0) to Brown noise (β = 2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb-Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of

  12. The quantitative LOD score: test statistic and sample size for exclusion and linkage of quantitative traits in human sibships.

    Science.gov (United States)

    Page, G P; Amos, C I; Boerwinkle, E

    1998-04-01

    We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.

  13. Particle System Based Adaptive Sampling on Spherical Parameter Space to Improve the MDL Method for Construction of Statistical Shape Models

    Directory of Open Access Journals (Sweden)

    Rui Xu

    2013-01-01

    Full Text Available Minimum description length (MDL based group-wise registration was a state-of-the-art method to determine the corresponding points of 3D shapes for the construction of statistical shape models (SSMs. However, it suffered from the problem that determined corresponding points did not uniformly spread on original shapes, since corresponding points were obtained by uniformly sampling the aligned shape on the parameterized space of unit sphere. We proposed a particle-system based method to obtain adaptive sampling positions on the unit sphere to resolve this problem. Here, a set of particles was placed on the unit sphere to construct a particle system whose energy was related to the distortions of parameterized meshes. By minimizing this energy, each particle was moved on the unit sphere. When the system became steady, particles were treated as vertices to build a spherical mesh, which was then relaxed to slightly adjust vertices to obtain optimal sampling-positions. We used 47 cases of (left and right lungs and 50 cases of livers, (left and right kidneys, and spleens for evaluations. Experiments showed that the proposed method was able to resolve the problem of the original MDL method, and the proposed method performed better in the generalization and specificity tests.

  14. Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

    Science.gov (United States)

    Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

    2018-03-01

    In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.

  15. Update on the non-prewhitening model observer in computed tomography for the assessment of the adaptive statistical and model-based iterative reconstruction algorithms

    Science.gov (United States)

    Ott, Julien G.; Becce, Fabio; Monnin, Pascal; Schmidt, Sabine; Bochud, François O.; Verdun, Francis R.

    2014-08-01

    The state of the art to describe image quality in medical imaging is to assess the performance of an observer conducting a task of clinical interest. This can be done by using a model observer leading to a figure of merit such as the signal-to-noise ratio (SNR). Using the non-prewhitening (NPW) model observer, we objectively characterised the evolution of its figure of merit in various acquisition conditions. The NPW model observer usually requires the use of the modulation transfer function (MTF) as well as noise power spectra. However, although the computation of the MTF poses no problem when dealing with the traditional filtered back-projection (FBP) algorithm, this is not the case when using iterative reconstruction (IR) algorithms, such as adaptive statistical iterative reconstruction (ASIR) or model-based iterative reconstruction (MBIR). Given that the target transfer function (TTF) had already shown it could accurately express the system resolution even with non-linear algorithms, we decided to tune the NPW model observer, replacing the standard MTF by the TTF. It was estimated using a custom-made phantom containing cylindrical inserts surrounded by water. The contrast differences between the inserts and water were plotted for each acquisition condition. Then, mathematical transformations were performed leading to the TTF. As expected, the first results showed a dependency of the image contrast and noise levels on the TTF for both ASIR and MBIR. Moreover, FBP also proved to be dependent of the contrast and noise when using the lung kernel. Those results were then introduced in the NPW model observer. We observed an enhancement of SNR every time we switched from FBP to ASIR to MBIR. IR algorithms greatly improve image quality, especially in low-dose conditions. Based on our results, the use of MBIR could lead to further dose reduction in several clinical applications.

  16. Spatial statistics of hydrography and water chemistry in a eutrophic boreal lake based on sounding and water samples.

    Science.gov (United States)

    Leppäranta, Matti; Lewis, John E; Heini, Anniina; Arvola, Lauri

    2018-06-04

    Spatial variability, an essential characteristic of lake ecosystems, has often been neglected in field research and monitoring. In this study, we apply spatial statistical methods for the key physics and chemistry variables and chlorophyll a over eight sampling dates in two consecutive years in a large (area 103 km 2 ) eutrophic boreal lake in southern Finland. In the four summer sampling dates, the water body was vertically and horizontally heterogenic except with color and DOC, in the two winter ice-covered dates DO was vertically stratified, while in the two autumn dates, no significant spatial differences in any of the measured variables were found. Chlorophyll a concentration was one order of magnitude lower under the ice cover than in open water. The Moran statistic for spatial correlation was significant for chlorophyll a and NO 2 +NO 3 -N in all summer situations and for dissolved oxygen and pH in three cases. In summer, the mass centers of the chemicals were within 1.5 km from the geometric center of the lake, and the 2nd moment radius ranged in 3.7-4.1 km respective to 3.9 km for the homogeneous situation. The lateral length scales of the studied variables were 1.5-2.5 km, about 1 km longer in the surface layer. The detected spatial "noise" strongly suggests that besides vertical variation also the horizontal variation in eutrophic lakes, in particular, should be considered when the ecosystems are monitored.

  17. A Hybrid Algorithm for Period Analysis from Multiband Data with Sparse and Irregular Sampling for Arbitrary Light-curve Shapes

    Science.gov (United States)

    Saha, Abhijit; Vivas, A. Katherina

    2017-12-01

    Ongoing and future surveys with repeat imaging in multiple bands are producing (or will produce) time-spaced measurements of brightness, resulting in the identification of large numbers of variable sources in the sky. A large fraction of these are periodic variables: compilations of these are of scientific interest for a variety of purposes. Unavoidably, the data sets from many such surveys not only have sparse sampling, but also have embedded frequencies in the observing cadence that beat against the natural periodicities of any object under investigation. Such limitations can make period determination ambiguous and uncertain. For multiband data sets with asynchronous measurements in multiple passbands, we wish to maximally use the information on periodicity in a manner that is agnostic of differences in the light-curve shapes across the different channels. Given large volumes of data, computational efficiency is also at a premium. This paper develops and presents a computationally economic method for determining periodicity that combines the results from two different classes of period-determination algorithms. The underlying principles are illustrated through examples. The effectiveness of this approach for combining asynchronously sampled measurements in multiple observables that share an underlying fundamental frequency is also demonstrated.

  18. Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

    Science.gov (United States)

    Hagell, Peter; Westergren, Albert

    Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).

  19. The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power

    Science.gov (United States)

    Fraley, R. Chris; Vazire, Simine

    2014-01-01

    The authors evaluate the quality of research reported in major journals in social-personality psychology by ranking those journals with respect to their N-pact Factors (NF)—the statistical power of the empirical studies they publish to detect typical effect sizes. Power is a particularly important attribute for evaluating research quality because, relative to studies that have low power, studies that have high power are more likely to (a) to provide accurate estimates of effects, (b) to produce literatures with low false positive rates, and (c) to lead to replicable findings. The authors show that the average sample size in social-personality research is 104 and that the power to detect the typical effect size in the field is approximately 50%. Moreover, they show that there is considerable variation among journals in sample sizes and power of the studies they publish, with some journals consistently publishing higher power studies than others. The authors hope that these rankings will be of use to authors who are choosing where to submit their best work, provide hiring and promotion committees with a superior way of quantifying journal quality, and encourage competition among journals to improve their NF rankings. PMID:25296159

  20. Structures and algorithms for post-processing large data sets and multi-variate functions in spatio-temporal statistics

    KAUST Repository

    Litvinenko, Alexander

    2017-12-10

    Matrices began in the 2nd century BC with the Chinese. One can find traces, which go to the 4th century BC to the Babylonians. The text ``Nine Chapters of the Mathematical Art\\'\\' written during the Han Dynasty in China gave the first known example of matrix methods. They were used to solve simultaneous linear equations (more in http://math.nie.edu.sg/bwjyeo/it/MathsOnline_AM/livemath/the/IT3AMMatricesHistory.html). The first ideas of the maximum likelihood estimation (MLE) was introduces by Laplace (1749-1827), by Gauss (1777-1855), the Likelihood was defined by Thiele Thorvald (1838-1910). Why we still use matrices? The matrix data format is more than 2200 years old. Our world is multi-dimensional! Why not to introduce a more appropriate data format and why not to reformulate the MLE method for it? In this work we are utilizing the low-rank tensor formats for multi-dimansional functions, which appear in spatial statistics.

  1. Prediction of modified Mercalli intensity from PGA, PGV, moment magnitude, and epicentral distance using several nonlinear statistical algorithms

    Science.gov (United States)

    Alvarez, Diego A.; Hurtado, Jorge E.; Bedoya-Ruíz, Daniel Alveiro

    2012-07-01

    Despite technological advances in seismic instrumentation, the assessment of the intensity of an earthquake using an observational scale as given, for example, by the modified Mercalli intensity scale is highly useful for practical purposes. In order to link the qualitative numbers extracted from the acceleration record of an earthquake and other instrumental data such as peak ground velocity, epicentral distance, and moment magnitude on the one hand and the modified Mercalli intensity scale on the other, simple statistical regression has been generally employed. In this paper, we will employ three methods of nonlinear regression, namely support vector regression, multilayer perceptrons, and genetic programming in order to find a functional dependence between the instrumental records and the modified Mercalli intensity scale. The proposed methods predict the intensity of an earthquake while dealing with nonlinearity and the noise inherent to the data. The nonlinear regressions with good estimation results have been performed using the "Did You Feel It?" database of the US Geological Survey and the database of the Center for Engineering Strong Motion Data for the California region.

  2. Autism Diagnostic Interview-Revised (ADI-R) Algorithms for Toddlers and Young Preschoolers: Application in a Non-US Sample of 1,104 Children

    Science.gov (United States)

    de Bildt, Annelies; Sytema, Sjoerd; Zander, Eric; Bölte, Sven; Sturm, Harald; Yirmiya, Nurit; Yaari, Maya; Charman, Tony; Salomone, Erica; LeCouteur, Ann; Green, Jonathan; Bedia, Ricardo Canal; Primo, Patricia García; van Daalen, Emma; de Jonge, Maretha V.; Guðmundsdóttir, Emilía; Jóhannsdóttir, Sigurrós; Raleva, Marija; Boskovska, Meri; Rogé, Bernadette; Baduel, Sophie; Moilanen, Irma; Yliherva, Anneli; Buitelaar, Jan; Oosterling, Iris J.

    2015-01-01

    The current study aimed to investigate the Autism Diagnostic Interview-Revised (ADI-R) algorithms for toddlers and young preschoolers (Kim and Lord, "J Autism Dev Disord" 42(1):82-93, 2012) in a non-US sample from ten sites in nine countries (n = 1,104). The construct validity indicated a good fit of the algorithms. The diagnostic…

  3. A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data

    Directory of Open Access Journals (Sweden)

    Olson James M

    2006-04-01

    Full Text Available Abstract Background Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set, since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data. Results We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic

  4. Operationalization and Validation of the Stopping Elderly Accidents, Deaths, and Injuries (STEADI) Fall Risk Algorithm in a Nationally Representative Sample

    Science.gov (United States)

    Lohman, Matthew C.; Crow, Rebecca S.; DiMilia, Peter R.; Nicklett, Emily J.; Bruce, Martha L.; Batsis, John A.

    2017-01-01

    Background Preventing falls and fall-related injuries among older adults is a public health priority. The Stopping Elderly Accidents, Deaths, and Injuries (STEADI) tool was developed to promote fall risk screening and encourage coordination between clinical and community-based fall prevention resources; however, little is known about the tool’s predictive validity or adaptability to survey data. Methods Data from five annual rounds (2011–2015) of the National Health and Aging Trends Study (NHATS), a representative cohort of adults age 65 and older in the US. Analytic sample respondents (n=7,392) were categorized at baseline as having low, moderate, or high fall risk according to the STEADI algorithm adapted for use with NHATS data. Logistic mixed-effects regression was used to estimate the association between baseline fall risk and subsequent falls and mortality. Analyses incorporated complex sampling and weighting elements to permit inferences at a national level. Results Participants classified as having moderate and high fall risk had 2.62 (95% CI: 2.29, 2.99) and 4.76 (95% CI: 3.51, 6.47) times greater odds of falling during follow-up compared to those with low risk, respectively, controlling for sociodemographic and health related risk factors for falls. High fall risk was also associated with greater likelihood of falling multiple times annually but not with greater risk of mortality. Conclusion The adapted STEADI clinical fall risk screening tool is a valid measure for predicting future fall risk using survey cohort data. Further efforts to standardize screening for fall risk and to coordinate between clinical and community-based fall prevention initiatives are warranted. PMID:28947669

  5. Operationalisation and validation of the Stopping Elderly Accidents, Deaths, and Injuries (STEADI) fall risk algorithm in a nationally representative sample.

    Science.gov (United States)

    Lohman, Matthew C; Crow, Rebecca S; DiMilia, Peter R; Nicklett, Emily J; Bruce, Martha L; Batsis, John A

    2017-12-01

    Preventing falls and fall-related injuries among older adults is a public health priority. The Stopping Elderly Accidents, Deaths, and Injuries (STEADI) tool was developed to promote fall risk screening and encourage coordination between clinical and community-based fall prevention resources; however, little is known about the tool's predictive validity or adaptability to survey data. Data from five annual rounds (2011-2015) of the National Health and Aging Trends Study (NHATS), a representative cohort of adults age 65 years and older in the USA. Analytic sample respondents (n=7392) were categorised at baseline as having low, moderate or high fall risk according to the STEADI algorithm adapted for use with NHATS data. Logistic mixed-effects regression was used to estimate the association between baseline fall risk and subsequent falls and mortality. Analyses incorporated complex sampling and weighting elements to permit inferences at a national level. Participants classified as having moderate and high fall risk had 2.62 (95% CI 2.29 to 2.99) and 4.76 (95% CI 3.51 to 6.47) times greater odds of falling during follow-up compared with those with low risk, respectively, controlling for sociodemographic and health-related risk factors for falls. High fall risk was also associated with greater likelihood of falling multiple times annually but not with greater risk of mortality. The adapted STEADI clinical fall risk screening tool is a valid measure for predicting future fall risk using survey cohort data. Further efforts to standardise screening for fall risk and to coordinate between clinical and community-based fall prevention initiatives are warranted. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. The price of electricity from private power producers: Stage 2, Expansion of sample and preliminary statistical analysis

    Energy Technology Data Exchange (ETDEWEB)

    Comnes, G.A.; Belden, T.N.; Kahn, E.P.

    1995-02-01

    The market for long-term bulk power is becoming increasingly competitive and mature. Given that many privately developed power projects have been or are being developed in the US, it is possible to begin to evaluate the performance of the market by analyzing its revealed prices. Using a consistent method, this paper presents levelized contract prices for a sample of privately developed US generation properties. The sample includes 26 projects with a total capacity of 6,354 MW. Contracts are described in terms of their choice of technology, choice of fuel, treatment of fuel price risk, geographic location, dispatchability, expected dispatch niche, and size. The contract price analysis shows that gas technologies clearly stand out as the most attractive. At an 80% capacity factor, coal projects have an average 20-year levelized price of $0.092/kWh, whereas natural gas combined cycle and/or cogeneration projects have an average price of $0.069/kWh. Within each technology type subsample, however, there is considerable variation. Prices for natural gas combustion turbines and one wind project are also presented. A preliminary statistical analysis is conducted to understand the relationship between price and four categories of explanatory factors including product heterogeneity, geographic heterogeneity, economic and technological change, and other buyer attributes (including avoided costs). Because of residual price variation, we are unable to accept the hypothesis that electricity is a homogeneous product. Instead, the analysis indicates that buyer value still plays an important role in the determination of price for competitively-acquired electricity.

  7. Statistical evaluation of fatty acid profile and cholesterol content in fish (common carp) lipids obtained by different sample preparation procedures.

    Science.gov (United States)

    Spiric, Aurelija; Trbovic, Dejana; Vranic, Danijela; Djinovic, Jasna; Petronijevic, Radivoj; Matekalo-Sverak, Vesna

    2010-07-05

    the second principal component (PC2) is recorded by C18:3 n-3, and C20:3 n-6, being present in a higher amount in the samples treated by the modified Soxhlet extraction, while C22:5 n-3, C20:3 n-3, C22:1 and C20:4, C16 and C18 negatively influence the score values of the PC2, showing significantly increased level in the samples treated by ASE method. Hotelling's paired T-square test used on the first three principal components for confirmation of differences in individual fatty acid content obtained by ASE and Soxhlet method in carp muscle showed statistically significant difference between these two data sets (T(2)=161.308, p<0.001). Copyright 2010 Elsevier B.V. All rights reserved.

  8. Statistical process control charts for attribute data involving very large sample sizes: a review of problems and solutions.

    Science.gov (United States)

    Mohammed, Mohammed A; Panesar, Jagdeep S; Laney, David B; Wilson, Richard

    2013-04-01

    The use of statistical process control (SPC) charts in healthcare is increasing. The primary purpose of SPC is to distinguish between common-cause variation which is attributable to the underlying process, and special-cause variation which is extrinsic to the underlying process. This is important because improvement under common-cause variation requires action on the process, whereas special-cause variation merits an investigation to first find the cause. Nonetheless, when dealing with attribute or count data (eg, number of emergency admissions) involving very large sample sizes, traditional SPC charts often produce tight control limits with most of the data points appearing outside the control limits. This can give a false impression of common and special-cause variation, and potentially misguide the user into taking the wrong actions. Given the growing availability of large datasets from routinely collected databases in healthcare, there is a need to present a review of this problem (which arises because traditional attribute charts only consider within-subgroup variation) and its solutions (which consider within and between-subgroup variation), which involve the use of the well-established measurements chart and the more recently developed attribute charts based on Laney's innovative approach. We close by making some suggestions for practice.

  9. Factor Analysis with EM Algorithm Never Gives Improper Solutions when Sample Covariance and Initial Parameter Matrices Are Proper

    Science.gov (United States)

    Adachi, Kohei

    2013-01-01

    Rubin and Thayer ("Psychometrika," 47:69-76, 1982) proposed the EM algorithm for exploratory and confirmatory maximum likelihood factor analysis. In this paper, we prove the following fact: the EM algorithm always gives a proper solution with positive unique variances and factor correlations with absolute values that do not exceed one,…

  10. Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

    International Nuclear Information System (INIS)

    Aslan, B.; Zech, G.

    2005-01-01

    We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications

  11. Monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm on permanent plots: sampling methods and statistical properties of data

    Science.gov (United States)

    A.R. Mason; H.G. Paul

    1994-01-01

    Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...

  12. Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

    Science.gov (United States)

    Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

    2006-01-01

    We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…

  13. RHAPSODY. I. STRUCTURAL PROPERTIES AND FORMATION HISTORY FROM A STATISTICAL SAMPLE OF RE-SIMULATED CLUSTER-SIZE HALOS

    International Nuclear Information System (INIS)

    Wu, Hao-Yi; Hahn, Oliver; Wechsler, Risa H.; Mao, Yao-Yuan; Behroozi, Peter S.

    2013-01-01

    We present the first results from the RHAPSODY cluster re-simulation project: a sample of 96 'zoom-in' simulations of dark matter halos of 10 14.8±0.05 h –1 M ☉ , selected from a 1 h –3 Gpc 3 volume. This simulation suite is the first to resolve this many halos with ∼5 × 10 6 particles per halo in the cluster mass regime, allowing us to statistically characterize the distribution of and correlation between halo properties at fixed mass. We focus on the properties of the main halos and how they are affected by formation history, which we track back to z = 12, over five decades in mass. We give particular attention to the impact of the formation history on the density profiles of the halos. We find that the deviations from the Navarro-Frenk-White (NFW) model and the Einasto model depend on formation time. Late-forming halos tend to have considerable deviations from both models, partly due to the presence of massive subhalos, while early-forming halos deviate less but still significantly from the NFW model and are better described by the Einasto model. We find that the halo shapes depend only moderately on formation time. Departure from spherical symmetry impacts the density profiles through the anisotropic distribution of massive subhalos. Further evidence of the impact of subhalos is provided by analyzing the phase-space structure. A detailed analysis of the properties of the subhalo population in RHAPSODY is presented in a companion paper.

  14. Small Sample Statistics for Incomplete Nonnormal Data: Extensions of Complete Data Formulae and a Monte Carlo Comparison

    Science.gov (United States)

    Savalei, Victoria

    2010-01-01

    Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…

  15. Sample processing, protocol, and statistical analysis of the time-of-flight secondary ion mass spectrometry (ToF-SIMS) of protein, cell, and tissue samples.

    Science.gov (United States)

    Barreto, Goncalo; Soininen, Antti; Sillat, Tarvo; Konttinen, Yrjö T; Kaivosoja, Emilia

    2014-01-01

    Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is increasingly being used in analysis of biological samples. For example, it has been applied to distinguish healthy and osteoarthritic human cartilage. This chapter discusses ToF-SIMS principle and instrumentation including the three modes of analysis in ToF-SIMS. ToF-SIMS sets certain requirements for the samples to be analyzed; for example, the samples have to be vacuum compatible. Accordingly, sample processing steps for different biological samples, i.e., proteins, cells, frozen and paraffin-embedded tissues and extracellular matrix for the ToF-SIMS are presented. Multivariate analysis of the ToF-SIMS data and the necessary data preprocessing steps (peak selection, data normalization, mean-centering, and scaling and transformation) are discussed in this chapter.

  16. Study on the influence of X-ray tube spectral distribution on the analysis of bulk samples and thin films: Fundamental parameters method and theoretical coefficient algorithms

    International Nuclear Information System (INIS)

    Sitko, Rafal

    2008-01-01

    Knowledge of X-ray tube spectral distribution is necessary in theoretical methods of matrix correction, i.e. in both fundamental parameter (FP) methods and theoretical influence coefficient algorithms. Thus, the influence of X-ray tube distribution on the accuracy of the analysis of thin films and bulk samples is presented. The calculations are performed using experimental X-ray tube spectra taken from the literature and theoretical X-ray tube spectra evaluated by three different algorithms proposed by Pella et al. (X-Ray Spectrom. 14 (1985) 125-135), Ebel (X-Ray Spectrom. 28 (1999) 255-266), and Finkelshtein and Pavlova (X-Ray Spectrom. 28 (1999) 27-32). In this study, Fe-Cr-Ni system is selected as an example and the calculations are performed for X-ray tubes commonly applied in X-ray fluorescence analysis (XRF), i.e., Cr, Mo, Rh and W. The influence of X-ray tube spectra on FP analysis is evaluated when quantification is performed using various types of calibration samples. FP analysis of bulk samples is performed using pure-element bulk standards and multielement bulk standards similar to the analyzed material, whereas for FP analysis of thin films, the bulk and thin pure-element standards are used. For the evaluation of the influence of X-ray tube spectra on XRF analysis performed by theoretical influence coefficient methods, two algorithms for bulk samples are selected, i.e. Claisse-Quintin (Can. Spectrosc. 12 (1967) 129-134) and COLA algorithms (G.R. Lachance, Paper Presented at the International Conference on Industrial Inorganic Elemental Analysis, Metz, France, June 3, 1981) and two algorithms (constant and linear coefficients) for thin films recently proposed by Sitko (X-Ray Spectrom. 37 (2008) 265-272)

  17. 3D visualization and finite element mesh formation from wood anatomy samples, Part II – Algorithm approach

    Directory of Open Access Journals (Sweden)

    Petr Koňas

    2009-01-01

    Full Text Available Paper presents new original application WOOD3D in form of program code assembling. The work extends the previous article “Part I – Theoretical approach” in detail description of implemented C++ classes of utilized projects Visualization Toolkit (VTK, Insight Toolkit (ITK and MIMX. Code is written in CMake style and it is available as multiplatform application. Currently GNU Linux (32/64b and MS Windows (32/64b platforms were released. Article discusses various filter classes for image filtering. Mainly Otsu and Binary threshold filters are classified for anatomy wood samples thresholding. Registration of images series is emphasized for difference of colour spaces compensation is included. Resulted work flow of image analysis is new methodological approach for images processing through the composition, visualization, filtering, registration and finite element mesh formation. Application generates script in ANSYS parametric design language (APDL which is fully compatible with ANSYS finite element solver and designer environment. The script includes the whole definition of unstructured finite element mesh formed by individual elements and nodes. Due to simple notation, the same script can be used for generation of geometrical entities in element positions. Such formed volumetric entities are prepared for further geometry approximation (e.g. by boolean or more advanced methods. Hexahedral and tetrahedral types of mesh elements are formed on user request with specified mesh options. Hexahedral meshes are formed both with uniform element size and with anisotropic character. Modified octree method for hexahedral mesh with anisotropic character was declared in application. Multicore CPUs in the application are supported for fast image analysis realization. Visualization of image series and consequent 3D image are realized in VTK format sufficiently known and public format, visualized in GPL application Paraview. Future work based on mesh

  18. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    Energy Technology Data Exchange (ETDEWEB)

    Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

    2017-02-05

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.

  19. Statistical Power and Optimum Sample Allocation Ratio for Treatment and Control Having Unequal Costs Per Unit of Randomization

    Science.gov (United States)

    Liu, Xiaofeng

    2003-01-01

    This article considers optimal sample allocation between the treatment and control condition in multilevel designs when the costs per sampling unit vary due to treatment assignment. Optimal unequal allocation may reduce the cost from that of a balanced design without sacrificing any power. The optimum sample allocation ratio depends only on the…

  20. Evaluation of observables in statistical multifragmentation theories

    International Nuclear Information System (INIS)

    Cole, A.J.

    1989-01-01

    The canonical formulation of equilibrium statistical multifragmentation is examined. It is shown that the explicit construction of observables (average values) by sampling the partition probabilities is unnecessary insofar as closed expressions in the form of recursion relations can be obtained quite easily. Such expressions may conversely be used to verify the sampling algorithms

  1. Field Sampling from a Segmented Image

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2008-06-01

    Full Text Available This paper presents a statistical method for deriving the optimal prospective field sampling scheme on a remote sensing image to represent different categories in the field. The iterated conditional modes algorithm (ICM) is used for segmentation...

  2. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    Science.gov (United States)

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing

  4. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    Science.gov (United States)

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell

  5. Statistical properties of the surface velocity field in the northern Gulf of Mexico sampled by GLAD drifters

    NARCIS (Netherlands)

    Mariano, A.J.; Ryan, E.H.; Huntley, H.S.; Laurindo, L.C.; Coelho, E.; Ozgokmen, TM; Berta, M.; Bogucki, D; Chen, S.S.; Curcic, M.; Drouin, K.L.; Gough, M; Haus, BK; Haza, A.C.; Hogan, P; Iskandarani, M; Jacobs, G; Kirwan Jr., A.D.; Laxague, N; Lipphardt Jr., B.; Magaldi, M.G.; Novelli, G.; Reniers, A.J.H.M.; Restrepo, J.M.; Smith, C; Valle-Levinson, A.; Wei, M.

    2016-01-01

    The Grand LAgrangian Deployment (GLAD) used multiscale sampling and GPS technology to observe time series of drifter positions with initial drifter separation of O(100 m) to O(10 km), and nominal 5 min sampling, during the summer and fall of 2012 in the northern Gulf of Mexico. Histograms of the

  6. A Personalized Rolling Optimal Charging Schedule for Plug-In Hybrid Electric Vehicle Based on Statistical Energy Demand Analysis and Heuristic Algorithm

    Directory of Open Access Journals (Sweden)

    Fanrong Kong

    2017-09-01

    Full Text Available To alleviate the emission of greenhouse gas and the dependence on fossil fuel, Plug-in Hybrid Electrical Vehicles (PHEVs have gained an increasing popularity in current decades. Due to the fluctuating electricity prices in the power market, a charging schedule is very influential to driving cost. Although the next-day electricity prices can be obtained in a day-ahead power market, a driving plan is not easily made in advance. Although PHEV owners can input a next-day plan into a charging system, e.g., aggregators, day-ahead, it is a very trivial task to do everyday. Moreover, the driving plan may not be very accurate. To address this problem, in this paper, we analyze energy demands according to a PHEV owner’s historical driving records and build a personalized statistic driving model. Based on the model and the electricity spot prices, a rolling optimization strategy is proposed to help make a charging decision in the current time slot. On one hand, by employing a heuristic algorithm, the schedule is made according to the situations in the following time slots. On the other hand, however, after the current time slot, the schedule will be remade according to the next tens of time slots. Hence, the schedule is made by a dynamic rolling optimization, but it only decides the charging decision in the current time slot. In this way, the fluctuation of electricity prices and driving routine are both involved in the scheduling. Moreover, it is not necessary for PHEV owners to input a day-ahead driving plan. By the optimization simulation, the results demonstrate that the proposed method is feasible to help owners save charging costs and also meet requirements for driving.

  7. Computed tomography imaging with the Adaptive Statistical Iterative Reconstruction (ASIR) algorithm: dependence of image quality on the blending level of reconstruction.

    Science.gov (United States)

    Barca, Patrizio; Giannelli, Marco; Fantacci, Maria Evelina; Caramella, Davide

    2018-06-01

    Computed tomography (CT) is a useful and widely employed imaging technique, which represents the largest source of population exposure to ionizing radiation in industrialized countries. Adaptive Statistical Iterative Reconstruction (ASIR) is an iterative reconstruction algorithm with the potential to allow reduction of radiation exposure while preserving diagnostic information. The aim of this phantom study was to assess the performance of ASIR, in terms of a number of image quality indices, when different reconstruction blending levels are employed. CT images of the Catphan-504 phantom were reconstructed using conventional filtered back-projection (FBP) and ASIR with reconstruction blending levels of 20, 40, 60, 80, and 100%. Noise, noise power spectrum (NPS), contrast-to-noise ratio (CNR) and modulation transfer function (MTF) were estimated for different scanning parameters and contrast objects. Noise decreased and CNR increased non-linearly up to 50 and 100%, respectively, with increasing blending level of reconstruction. Also, ASIR has proven to modify the NPS curve shape. The MTF of ASIR reconstructed images depended on tube load/contrast and decreased with increasing blending level of reconstruction. In particular, for low radiation exposure and low contrast acquisitions, ASIR showed lower performance than FBP, in terms of spatial resolution for all blending levels of reconstruction. CT image quality varies substantially with the blending level of reconstruction. ASIR has the potential to reduce noise whilst maintaining diagnostic information in low radiation exposure CT imaging. Given the opposite variation of CNR and spatial resolution with the blending level of reconstruction, it is recommended to use an optimal value of this parameter for each specific clinical application.

  8. Effect of carboxymethylcellulose on the rheological and filtration properties of bentonite clay samples determined by experimental planning and statistical analysis

    Directory of Open Access Journals (Sweden)

    B. M. A. Brito

    Full Text Available Abstract Over the past few years, considerable research has been conducted using the techniques of mixture delineation and statistical modeling. Through this methodology, applications in various technological fields have been found/optimized, especially in clay technology, leading to greater efficiency and reliability. This work studied the influence of carboxymethylcellulose on the rheological and filtration properties of bentonite dispersions to be applied in water-based drilling fluids using experimental planning and statistical analysis for clay mixtures. The dispersions were prepared according to Petrobras standard EP-1EP-00011-A, which deals with the testing of water-based drilling fluid viscosifiers for oil prospecting. The clay mixtures were transformed into sodic compounds, and carboxymethylcellulose additives of high and low molar mass were added, in order to improve their rheology and filtrate volume. Experimental planning and statistical analysis were used to verify the effect. The regression models were calculated for the relation between the compositions and the following rheological properties: apparent viscosity, plastic viscosity, and filtrate volume. The significance and validity of the models were confirmed. The results showed that the 3D response surfaces of the compositions with high molecular weight carboxymethylcellulose added were the ones that most contributed to the rise in apparent viscosity and plastic viscosity, and that those with low molecular weight were the ones that most helped in the reduction of the filtrate volume. Another important observation is that the experimental planning and statistical analysis can be used as an important auxiliary tool to optimize the rheological properties and filtrate volume of bentonite clay dispersions for use in drilling fluids when carboxymethylcellulose is added.

  9. A new framework of statistical inferences based on the valid joint sampling distribution of the observed counts in an incomplete contingency table.

    Science.gov (United States)

    Tian, Guo-Liang; Li, Hui-Qiong

    2017-08-01

    Some existing confidence interval methods and hypothesis testing methods in the analysis of a contingency table with incomplete observations in both margins entirely depend on an underlying assumption that the sampling distribution of the observed counts is a product of independent multinomial/binomial distributions for complete and incomplete counts. However, it can be shown that this independency assumption is incorrect and can result in unreliable conclusions because of the under-estimation of the uncertainty. Therefore, the first objective of this paper is to derive the valid joint sampling distribution of the observed counts in a contingency table with incomplete observations in both margins. The second objective is to provide a new framework for analyzing incomplete contingency tables based on the derived joint sampling distribution of the observed counts by developing a Fisher scoring algorithm to calculate maximum likelihood estimates of parameters of interest, the bootstrap confidence interval methods, and the bootstrap testing hypothesis methods. We compare the differences between the valid sampling distribution and the sampling distribution under the independency assumption. Simulation studies showed that average/expected confidence-interval widths of parameters based on the sampling distribution under the independency assumption are shorter than those based on the new sampling distribution, yielding unrealistic results. A real data set is analyzed to illustrate the application of the new sampling distribution for incomplete contingency tables and the analysis results again confirm the conclusions obtained from the simulation studies.

  10. Special nuclear material inventory sampling plans

    International Nuclear Information System (INIS)

    Vaccaro, H.S.; Goldman, A.S.

    1987-01-01

    This paper presents improved procedures for obtaining statistically valid sampling plans for nuclear facilities. The double sampling concept and methods for developing optimal double sampling plans are described. An algorithm is described that is satisfactory for finding optimal double sampling plans and choosing appropriate detection and false alarm probabilities

  11. Statistical analysis of fluoride levels in human urine and drinking water samples of fluorinated area of punjab (pakistan)

    International Nuclear Information System (INIS)

    Qayyum, M.; Zaman, W.U.; Rehman, R.; Ahmad, B.; Ahmad, M.; Ali, S.; Murtaza, S

    2013-01-01

    Increasing fluoride levels in drinking water of fluorinated areas of world leading to fluorosis. For bio-monitoring of fluorosis patients, fluoride levels were determined in drinking water and human urine samples of different individuals having dental fluorosis and bony deformities from fluorotic area of Punjab (Sham Ki Bhatiyan, Pakistan) and then compared with reference samples of non fluorotic area (Queens Road, Lahore, Pakistan) using ion selective electrode methodology. Fluoride levels in fluorinated area differ significantly from control group (p < 0.05). In drinking water and human urine samples, fluoride levels in fluorinated areas were: 136.192 +- 67.836 and 94.484 +- 36.572 micro molL/sup -1/ respectively, whereas in control samples, fluoride concentrations were: 19.306 +- 2.109 and 47.154 +- 22.685 micro molL/sup -1/ in water and urine samples correspondingly. Pearson's correlation data pointed out the fact that that human urine and water fluoride concentrations have a significant positive dose response relationship with the prevalence of dental and skeletal fluorosis in fluorotic areas having higher fluoride levels in drinking water. (author)

  12. Estimation of thermal neutron cross sections from K, U, Th concentrations for rock samples using neural network algorithms

    International Nuclear Information System (INIS)

    Loskiewicz, J.; Swakon, J.

    1992-01-01

    In the paper present the results of the use of the neural network algorithms to find a function Σ a =f(K, U, Th,...). The easily measurable parameters (K, U, Th concentrations, lithology) were used to estimate the thermal neutron absorption cross-section Σ a , which is difficult to measure in the borehole conditions. This paper is suggesting a possible solution to the problem. This method may have an important application in the well-logging data treatment. (author). 6 refs, 9 tabs

  13. 7 CFR 52.38c - Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes.

    Science.gov (United States)

    2010-01-01

    ... FRUITS AND VEGETABLES, PROCESSED PRODUCTS THEREOF, AND CERTAIN OTHER PROCESSED FOOD PRODUCTS 1... processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... inspection of processed fruits and vegetables by attributes. (a) General. Single sampling plans shall be used...

  14. Statistical properties of mean stand biomass estimators in a LIDAR-based double sampling forest survey design.

    Science.gov (United States)

    H.E. Anderson; J. Breidenbach

    2007-01-01

    Airborne laser scanning (LIDAR) can be a valuable tool in double-sampling forest survey designs. LIDAR-derived forest structure metrics are often highly correlated with important forest inventory variables, such as mean stand biomass, and LIDAR-based synthetic regression estimators have the potential to be highly efficient compared to single-stage estimators, which...

  15. Optimization of low-dose protocol in thoracic aorta CTA: weighting of adaptive statistical iterative reconstruction (ASIR) algorithm and scanning parameters

    International Nuclear Information System (INIS)

    Zhao Yongxia; Chang Jin; Zuo Ziwei; Zhang Changda; Zhang Tianle

    2014-01-01

    Objective: To investigate the best weighting of adaptive statistical iterative reconstruction (ASIR) algorithm and optimized low-dose scanning parameters in thoracic aorta CT angiography(CTA). Methods: Totally 120 patients with the body mass index (BMI) of 19-24 were randomly divided into 6 groups. All patients underwent thoracic aorta CTA with a GE Discovery CT 750 HD scanner (ranging from 290-330 mm). The default parameters (100 kV, 240 mAs) were applied in Group 1. Reconstructions were performed with different weightings of ASIR(10%-100% with 10%), and the signal to noise ratio (S/N) and contrast to noise ratio(C/N) of images were calculated. The images of series were evaluated by 2 independent radiologists with 5-point-scale and lastly the best weighting were revealed. Then the mAs in Group 2-6 were defined as 210, 180, 150, 120 and 90 with the kilovoltage 100. The CTDI_v_o_l and DLP in every scan series were recorded and the effective dose (E) was calculated. The S/N and C/N were calculated and the image quality was assessed by two radiologists. Results: The best weighing of ASIR was 60% at the 100 kV, 240 mAs. Under 60% of ASIR and 100 kV, the scores of image quality from 240 mAs to 90 mAs were (4.78±0.30)-(3.15±0.23). The CTDI_v_o_l and DLP were 12.64-4.41 mGy and 331.81-128.27 mGy, and the E was 4.98-1.92 mSv. The image qualities among Group 1-5 were nor significantly different (F = 5.365, P > 0.05), but the CTDI_v_o_l and DLP of Group 5 were reduced by 37.0% and 36.9%, respectively compared with Group 1. Conclusions: In thoracic aorta CT Angiography, the best weighting of ASIR is 60%, and 120 mAs is the best mAs with 100 kV in patients with BMI 19-24. (authors)

  16. Judging statistical models of individual decision making under risk using in- and out-of-sample criteria.

    Science.gov (United States)

    Drichoutis, Andreas C; Lusk, Jayson L

    2014-01-01

    Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.

  17. Judging statistical models of individual decision making under risk using in- and out-of-sample criteria.

    Directory of Open Access Journals (Sweden)

    Andreas C Drichoutis

    Full Text Available Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.

  18. A large sample of Kohonen-selected SDSS quasars with weak emission lines: selection effects and statistical properties

    Science.gov (United States)

    Meusinger, H.; Balafkan, N.

    2014-08-01

    Aims: A tiny fraction of the quasar population shows remarkably weak emission lines. Several hypotheses have been developed, but the weak line quasar (WLQ) phenomenon still remains puzzling. The aim of this study was to create a sizeable sample of WLQs and WLQ-like objects and to evaluate various properties of this sample. Methods: We performed a search for WLQs in the spectroscopic data from the Sloan Digital Sky Survey Data Release 7 based on Kohonen self-organising maps for nearly 105 quasar spectra. The final sample consists of 365 quasars in the redshift range z = 0.6 - 4.2 (z¯ = 1.50 ± 0.45) and includes in particular a subsample of 46 WLQs with equivalent widths WMg iiattention was paid to selection effects. Results: The WLQs have, on average, significantly higher luminosities, Eddington ratios, and accretion rates. About half of the excess comes from a selection bias, but an intrinsic excess remains probably caused primarily by higher accretion rates. The spectral energy distribution shows a bluer continuum at rest-frame wavelengths ≳1500 Å. The variability in the optical and UV is relatively low, even taking the variability-luminosity anti-correlation into account. The percentage of radio detected quasars and of core-dominant radio sources is significantly higher than for the control sample, whereas the mean radio-loudness is lower. Conclusions: The properties of our WLQ sample can be consistently understood assuming that it consists of a mix of quasars at the beginning of a stage of increased accretion activity and of beamed radio-quiet quasars. The higher luminosities and Eddington ratios in combination with a bluer spectral energy distribution can be explained by hotter continua, i.e. higher accretion rates. If quasar activity consists of subphases with different accretion rates, a change towards a higher rate is probably accompanied by an only slow development of the broad line region. The composite WLQ spectrum can be reasonably matched by the

  19. Condenser: a statistical aggregation tool for multi-sample quantitative proteomic data from Matrix Science Mascot Distiller™.

    Science.gov (United States)

    Knudsen, Anders Dahl; Bennike, Tue; Kjeldal, Henrik; Birkelund, Svend; Otzen, Daniel Erik; Stensballe, Allan

    2014-05-30

    We describe Condenser, a freely available, comprehensive open-source tool for merging multidimensional quantitative proteomics data from the Matrix Science Mascot Distiller Quantitation Toolbox into a common format ready for subsequent bioinformatic analysis. A number of different relative quantitation technologies, such as metabolic (15)N and amino acid stable isotope incorporation, label-free and chemical-label quantitation are supported. The program features multiple options for curative filtering of the quantified peptides, allowing the user to choose data quality thresholds appropriate for the current dataset, and ensure the quality of the calculated relative protein abundances. Condenser also features optional global normalization, peptide outlier removal, multiple testing and calculation of t-test statistics for highlighting and evaluating proteins with significantly altered relative protein abundances. Condenser provides an attractive addition to the gold-standard quantitative workflow of Mascot Distiller, allowing easy handling of larger multi-dimensional experiments. Source code, binaries, test data set and documentation are available at http://condenser.googlecode.com/. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Statistical analysis of biomechanical properties of the adult skull and age-related structural changes by sex in a Japanese forensic sample.

    Science.gov (United States)

    Torimitsu, Suguru; Nishida, Yoshifumi; Takano, Tachio; Koizumi, Yoshinori; Makino, Yohsuke; Yajima, Daisuke; Hayakawa, Mutsumi; Inokuchi, Go; Motomura, Ayumi; Chiba, Fumiko; Otsuka, Katsura; Kobayashi, Kazuhiro; Odo, Yuriko; Iwase, Hirotaro

    2014-01-01

    The purpose of this research was to investigate the biomechanical properties of the adult human skull and the structural changes that occur with age in both sexes. The heads of 94 Japanese cadavers (54 male cadavers, 40 female cadavers) autopsied in our department were used in this research. A total of 376 cranial samples, four from each skull, were collected. Sample fracture load was measured by a bending test. A statistically significant negative correlation between the sample fracture load and cadaver age was found. This indicates that the stiffness of cranial bones in Japanese individuals decreases with age, and the risk of skull fracture thus probably increases with age. Prior to the bending test, the sample mass, the sample thickness, the ratio of the sample thickness to cadaver stature (ST/CS), and the sample density were measured and calculated. Significant negative correlations between cadaver age and sample thickness, ST/CS, and the sample density were observed only among the female samples. Computerized tomographic (CT) images of 358 cranial samples were available. The computed tomography value (CT value) of cancellous bone which refers to a quantitative scale for describing radiodensity, cancellous bone thickness and cortical bone thickness were measured and calculated. Significant negative correlation between cadaver age and the CT value or cortical bone thickness was observed only among the female samples. These findings suggest that the skull is substantially affected by decreased bone metabolism resulting from osteoporosis. Therefore, osteoporosis prevention and treatment may increase cranial stiffness and reinforce the skull structure, leading to a decrease in the risk of skull fractures. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. Small-angle X-ray scattering tensor tomography: model of the three-dimensional reciprocal-space map, reconstruction algorithm and angular sampling requirements.

    Science.gov (United States)

    Liebi, Marianne; Georgiadis, Marios; Kohlbrecher, Joachim; Holler, Mirko; Raabe, Jörg; Usov, Ivan; Menzel, Andreas; Schneider, Philipp; Bunk, Oliver; Guizar-Sicairos, Manuel

    2018-01-01

    Small-angle X-ray scattering tensor tomography, which allows reconstruction of the local three-dimensional reciprocal-space map within a three-dimensional sample as introduced by Liebi et al. [Nature (2015), 527, 349-352], is described in more detail with regard to the mathematical framework and the optimization algorithm. For the case of trabecular bone samples from vertebrae it is shown that the model of the three-dimensional reciprocal-space map using spherical harmonics can adequately describe the measured data. The method enables the determination of nanostructure orientation and degree of orientation as demonstrated previously in a single momentum transfer q range. This article presents a reconstruction of the complete reciprocal-space map for the case of bone over extended ranges of q. In addition, it is shown that uniform angular sampling and advanced regularization strategies help to reduce the amount of data required.

  2. Multiparametric statistics

    CERN Document Server

    Serdobolskii, Vadim Ivanovich

    2007-01-01

    This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...

  3. Validation of Correction Algorithms for Near-IR Analysis of Human Milk in an Independent Sample Set-Effect of Pasteurization.

    Science.gov (United States)

    Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph

    2016-02-26

    Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified.

  4. Validation of Correction Algorithms for Near-IR Analysis of Human Milk in an Independent Sample Set—Effect of Pasteurization

    Science.gov (United States)

    Kotrri, Gynter; Fusch, Gerhard; Kwan, Celia; Choi, Dasol; Choi, Arum; Al Kafi, Nisreen; Rochow, Niels; Fusch, Christoph

    2016-01-01

    Commercial infrared (IR) milk analyzers are being increasingly used in research settings for the macronutrient measurement of breast milk (BM) prior to its target fortification. These devices, however, may not provide reliable measurement if not properly calibrated. In the current study, we tested a correction algorithm for a Near-IR milk analyzer (Unity SpectraStar, Brookfield, CT, USA) for fat and protein measurements, and examined the effect of pasteurization on the IR matrix and the stability of fat, protein, and lactose. Measurement values generated through Near-IR analysis were compared against those obtained through chemical reference methods to test the correction algorithm for the Near-IR milk analyzer. Macronutrient levels were compared between unpasteurized and pasteurized milk samples to determine the effect of pasteurization on macronutrient stability. The correction algorithm generated for our device was found to be valid for unpasteurized and pasteurized BM. Pasteurization had no effect on the macronutrient levels and the IR matrix of BM. These results show that fat and protein content can be accurately measured and monitored for unpasteurized and pasteurized BM. Of additional importance is the implication that donated human milk, generally low in protein content, has the potential to be target fortified. PMID:26927169

  5. DISK IMAGING SURVEY OF CHEMISTRY WITH SMA. II. SOUTHERN SKY PROTOPLANETARY DISK DATA AND FULL SAMPLE STATISTICS

    International Nuclear Information System (INIS)

    Oeberg, Karin I.; Qi Chunhua; Andrews, Sean M.; Espaillat, Catherine; Wilner, David J.; Fogel, Jeffrey K. J.; Bergin, Edwin A.; Pascucci, Ilaria; Kastner, Joel H.

    2011-01-01

    This is the second in a series of papers based on data from DISCS, a Submillimeter Array observing program aimed at spatially and spectrally resolving the chemical composition of 12 protoplanetary disks. We present data on six Southern sky sources-IM Lup, SAO 206462 (HD 135344b), HD 142527, AS 209, AS 205, and V4046 Sgr-which complement the six sources in the Taurus star-forming region reported previously. CO 2-1 and HCO + 3-2 emission are detected and resolved in all disks and show velocity patterns consistent with Keplerian rotation. Where detected, the emission from DCO + 3-2, N 2 H + 3-2, H 2 CO 3 03 - 2 02 and 4 14 - 3 13 , HCN 3-2, and CN 2 33/4/2 - 1 22/3/1 are also generally spatially resolved. The detection rates are highest toward the M and K stars, while the F star SAO 206462 has only weak CN and HCN emission, and H 2 CO alone is detected toward HD 142527. These findings together with the statistics from the previous Taurus disks support the hypothesis that high detection rates of many small molecules depend on the presence of a cold and protected disk midplane, which is less common around F and A stars compared to M and K stars. Disk-averaged variations in the proposed radiation tracer CN/HCN are found to be small, despite a two orders of magnitude range of spectral types and accretion rates. In contrast, the resolved images suggest that the CN/HCN emission ratio varies with disk radius in at least two of the systems. There are no clear observational differences in the disk chemistry between the classical/full T Tauri disks and transitional disks. Furthermore, the observed line emission does not depend on the measured accretion luminosities or the number of infrared lines detected, which suggests that the chemistry outside of 100 AU is not coupled to the physical processes that drive the chemistry in the innermost few AU.

  6. A statistical rationale for establishing process quality control limits using fixed sample size, for critical current verification of SSC superconducting wire

    International Nuclear Information System (INIS)

    Pollock, D.A.; Brown, G.; Capone, D.W. II; Christopherson, D.; Seuntjens, J.M.; Woltz, J.

    1992-01-01

    This work has demonstrated the statistical concepts behind the XBAR R method for determining sample limits to verify billet I c performance and process uniformity. Using a preliminary population estimate for μ and σ from a stable production lot of only 5 billets, we have shown that reasonable sensitivity to systematic process drift and random within billet variation may be achieved, by using per billet subgroup sizes of moderate proportions. The effects of subgroup size (n) and sampling risk (α and β) on the calculated control limits have been shown to be important factors that need to be carefully considered when selecting an actual number of measurements to be used per billet, for each supplier process. Given the present method of testing in which individual wire samples are ramped to I c only once, with measurement uncertainty due to repeatability and reproducibility (typically > 1.4%), large subgroups (i.e. >30 per billet) appear to be unnecessary, except as an inspection tool to confirm wire process history for each spool. The introduction of the XBAR R method or a similar Statistical Quality Control procedure is recommend for use in the superconducing wire production program, particularly when the program transitions from requiring tests for all pieces of wire to sampling each production unit

  7. Dark Matter Profiles in Dwarf Galaxies: A Statistical Sample Using High-Resolution Hα Velocity Fields from PCWI

    Science.gov (United States)

    Relatores, Nicole C.; Newman, Andrew B.; Simon, Joshua D.; Ellis, Richard; Truong, Phuongmai N.; Blitz, Leo

    2018-01-01

    We present high quality Hα velocity fields for a sample of nearby dwarf galaxies (log M/M⊙ = 8.4-9.8) obtained as part of the Dark Matter in Dwarf Galaxies survey. The purpose of the survey is to investigate the cusp-core discrepancy by quantifying the variation of the inner slope of the dark matter distributions of 26 dwarf galaxies, which were selected as likely to have regular kinematics. The data were obtained with the Palomar Cosmic Web Imager, located on the Hale 5m telescope. We extract rotation curves from the velocity fields and use optical and infrared photometry to model the stellar mass distribution. We model the total mass distribution as the sum of a generalized Navarro-Frenk-White dark matter halo along with the stellar and gaseous components. We present the distribution of inner dark matter density profile slopes derived from this analysis. For a subset of galaxies, we compare our results to an independent analysis based on CO observations. In future work, we will compare the scatter in inner density slopes, as well as their correlations with galaxy properties, to theoretical predictions for dark matter core creation via supernovae feedback.

  8. Triacylglycerol Analysis in Human Milk and Other Mammalian Species: Small-Scale Sample Preparation, Characterization, and Statistical Classification Using HPLC-ELSD Profiles.

    Science.gov (United States)

    Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco

    2015-06-24

    In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.

  9. A Smoothing Algorithm for a New Two-Stage Stochastic Model of Supply Chain Based on Sample Average Approximation

    OpenAIRE

    Liu Yang; Yao Xiong; Xiao-jiao Tong

    2017-01-01

    We construct a new two-stage stochastic model of supply chain with multiple factories and distributors for perishable product. By introducing a second-order stochastic dominance (SSD) constraint, we can describe the preference consistency of the risk taker while minimizing the expected cost of company. To solve this problem, we convert it into a one-stage stochastic model equivalently; then we use sample average approximation (SAA) method to approximate the expected values of the underlying r...

  10. Independent assessment of matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) sample preparation quality: A novel statistical approach for quality scoring.

    Science.gov (United States)

    Kooijman, Pieter C; Kok, Sander J; Weusten, Jos J A M; Honing, Maarten

    2016-05-05

    Preparation of samples according to an optimized method is crucial for accurate determination of polymer sample characteristics by Matrix-Assisted Laser Desorption Ionization (MALDI) analysis. Sample preparation conditions such as matrix choice, cationization agent, deposition technique or even the deposition volume should be chosen to suit the sample of interest. Many sample preparation protocols have been developed and employed, yet finding the optimal sample preparation protocol remains a challenge. Because an objective comparison between the results of diverse protocols is not possible, "gut-feeling" or "good enough" is often decisive in the search for an optimum. This implies that sub-optimal protocols are used, leading to a loss of mass spectral information quality. To address this problem a novel analytical strategy based on MALDI imaging and statistical data processing was developed in which eight parameters were formulated to objectively quantify the quality of sample deposition and optimal MALDI matrix composition and finally sum up to an overall quality score of the sample deposition. These parameters can be established in a fully automated way using commercially available mass spectrometry imaging instruments without any hardware adjustments. With the newly developed analytical strategy the highest quality MALDI spots were selected, resulting in more reproducible and more valuable spectra for PEG in a variety of matrices. Moreover, our method enables an objective comparison of sample preparation protocols for any analyte and opens up new fields of investigation by presenting MALDI performance data in a clear and concise way. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  12. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations.

    Directory of Open Access Journals (Sweden)

    Emma Lightfoot

    Full Text Available Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals' homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific

  13. The Investigation of Construct Validity of Diagnostic and Statistical Manual of Mental Disorder-5 Personality Traits on Iranian sample with Antisocial and Borderline Personality Disorders.

    Science.gov (United States)

    Amini, Mehdi; Pourshahbaz, Abbas; Mohammadkhani, Parvaneh; Ardakani, Mohammad-Reza Khodaie; Lotfi, Mozhgan

    2014-12-01

    The goal of this study was to examine the construct validity of the diagnostic and statistical manual of mental disorder-5 (DSM-5) conceptual model of antisocial and borderline personality disorders (PDs). More specifically, the aim was to determine whether the DSM-5 five-factor structure of pathological personality trait domains replicated in an independently collected sample that differs culturally from the derivation sample. This study was on a sample of 346 individuals with antisocial (n = 122) and borderline PD (n = 130), and nonclinical subjects (n = 94). Participants randomly selected from prisoners, out-patient, and in-patient clients. Participants were recruited from Tehran prisoners, and clinical psychology and psychiatry clinics of Razi and Taleghani Hospital, Tehran, Iran. The SCID-II-PQ, SCID-II, DSM-5 Personality Trait Rating Form (Clinician's PTRF) were used to diagnosis of PD and to assessment of pathological traits. The data were analyzed by exploratory factor analysis. Factor analysis revealed a 5-factor solution for DSM-5 personality traits. Results showed that DSM-5 has adequate construct validity in Iranian sample with antisocial and borderline PDs. Factors similar in number with the other studies, but different in the content. Exploratory factor analysis revealed five homogeneous components of antisocial and borderline PDs. That may represent personality, behavioral, and affective features central to the disorder. Furthermore, the present study helps understand the adequacy of DSM-5 dimensional approach to evaluation of personality pathology, specifically on Iranian sample.

  14. The investigation of construct validity of diagnostic and statistical manual of mental disorder-5 personality traits on iranian sample with antisocial and borderline personality disorders

    Directory of Open Access Journals (Sweden)

    Mehdi Amini

    2014-01-01

    Full Text Available Background: The goal of this study was to examine the construct validity of the diagnostic and statistical manual of mental disorder-5 (DSM-5 conceptual model of antisocial and borderline personality disorders (PDs. More specifically, the aim was to determine whether the DSM-5 five-factor structure of pathological personality trait domains replicated in an independently collected sample that differs culturally from the derivation sample. Methods: This study was on a sample of 346 individuals with antisocial (n = 122 and borderline PD (n = 130, and nonclinical subjects (n = 94. Participants randomly selected from prisoners, out-patient, and in-patient clients . Participants were recruited from Tehran prisoners, and clinical psychology and psychiatry clinics of Razi and Taleghani Hospital, Tehran, Iran. The SCID-II-PQ, SCID-II, DSM-5 Personality Trait Rating Form (Clinician′s PTRF were used to diagnosis of PD and to assessment of pathological traits. The data were analyzed by exploratory factor analysis. Results: Factor analysis revealed a 5-factor solution for DSM-5 personality traits. Results showed that DSM-5 has adequate construct validity in Iranian sample with antisocial and borderline PDs. Factors similar in number with the other studies, but different in the content. Conclusions: Exploratory factor analysis revealed five homogeneous components of antisocial and borderline PDs. That may represent personality, behavioral, and affective features central to the disorder. Furthermore, the present study helps understand the adequacy of DSM-5 dimensional approach to evaluation of personality pathology, specifically on Iranian sample.

  15. Towards the harmonization between National Forest Inventory and Forest Condition Monitoring. Consistency of plot allocation and effect of tree selection methods on sample statistics in Italy.

    Science.gov (United States)

    Gasparini, Patrizia; Di Cosmo, Lucio; Cenni, Enrico; Pompei, Enrico; Ferretti, Marco

    2013-07-01

    In the frame of a process aiming at harmonizing National Forest Inventory (NFI) and ICP Forests Level I Forest Condition Monitoring (FCM) in Italy, we investigated (a) the long-term consistency between FCM sample points (a subsample of the first NFI, 1985, NFI_1) and recent forest area estimates (after the second NFI, 2005, NFI_2) and (b) the effect of tree selection method (tree-based or plot-based) on sample composition and defoliation statistics. The two investigations were carried out on 261 and 252 FCM sites, respectively. Results show that some individual forest categories (larch and stone pine, Norway spruce, other coniferous, beech, temperate oaks and cork oak forests) are over-represented and others (hornbeam and hophornbeam, other deciduous broadleaved and holm oak forests) are under-represented in the FCM sample. This is probably due to a change in forest cover, which has increased by 1,559,200 ha from 1985 to 2005. In case of shift from a tree-based to a plot-based selection method, 3,130 (46.7%) of the original 6,703 sample trees will be abandoned, and 1,473 new trees will be selected. The balance between exclusion of former sample trees and inclusion of new ones will be particularly unfavourable for conifers (with only 16.4% of excluded trees replaced by new ones) and less for deciduous broadleaves (with 63.5% of excluded trees replaced). The total number of tree species surveyed will not be impacted, while the number of trees per species will, and the resulting (plot-based) sample composition will have a much larger frequency of deciduous broadleaved trees. The newly selected trees have-in general-smaller diameter at breast height (DBH) and defoliation scores. Given the larger rate of turnover, the deciduous broadleaved part of the sample will be more impacted. Our results suggest that both a revision of FCM network to account for forest area change and a plot-based approach to permit statistical inference and avoid bias in the tree sample

  16. Statistics Clinic

    Science.gov (United States)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  17. Comparative statistical analysis of carcinogenic and non-carcinogenic effects of uranium in groundwater samples from different regions of Punjab, India.

    Science.gov (United States)

    Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh

    2016-12-01

    LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30µgl -1 as well as AERB proposed limit of 60µgl -1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60µgl -1 . Average value observed in SW Punjab is around 3-4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Understanding Statistics - Cancer Statistics

    Science.gov (United States)

    Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.

  19. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    Science.gov (United States)

    Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

    2016-12-01

    : MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We

  20. Using pixel intensity as a self-regulating threshold for deterministic image sampling in Milano Retinex: the T-Rex algorithm

    Science.gov (United States)

    Lecca, Michela; Modena, Carla Maria; Rizzi, Alessandro

    2018-01-01

    Milano Retinexes are spatial color algorithms, part of the Retinex family, usually employed for image enhancement. They modify the color of each pixel taking into account the surrounding colors and their position, catching in this way the local spatial color distribution relevant to image enhancement. We present T-Rex (from the words threshold and Retinex), an implementation of Milano Retinex, whose main novelty is the use of the pixel intensity as a self-regulating threshold to deterministically sample local color information. The experiments, carried out on real-world pictures, show that T-Rex image enhancement performance are in line with those of the Milano Retinex family: T-Rex increases the brightness, the contrast, and the flatness of the channel distributions of the input image, making more intelligible the content of pictures acquired under difficult light conditions.

  1. Statistical searches for microlensing events in large, non-uniformly sampled time-domain surveys: A test using palomar transient factory data

    Energy Technology Data Exchange (ETDEWEB)

    Price-Whelan, Adrian M.; Agüeros, Marcel A. [Department of Astronomy, Columbia University, 550 W 120th Street, New York, NY 10027 (United States); Fournier, Amanda P. [Department of Physics, Broida Hall, University of California, Santa Barbara, CA 93106 (United States); Street, Rachel [Las Cumbres Observatory Global Telescope Network, Inc., 6740 Cortona Drive, Suite 102, Santa Barbara, CA 93117 (United States); Ofek, Eran O. [Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot (Israel); Covey, Kevin R. [Lowell Observatory, 1400 West Mars Hill Road, Flagstaff, AZ 86001 (United States); Levitan, David; Sesar, Branimir [Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125 (United States); Laher, Russ R.; Surace, Jason, E-mail: adrn@astro.columbia.edu [Spitzer Science Center, California Institute of Technology, Mail Stop 314-6, Pasadena, CA 91125 (United States)

    2014-01-20

    Many photometric time-domain surveys are driven by specific goals, such as searches for supernovae or transiting exoplanets, which set the cadence with which fields are re-imaged. In the case of the Palomar Transient Factory (PTF), several sub-surveys are conducted in parallel, leading to non-uniform sampling over its ∼20,000 deg{sup 2} footprint. While the median 7.26 deg{sup 2} PTF field has been imaged ∼40 times in the R band, ∼2300 deg{sup 2} have been observed >100 times. We use PTF data to study the trade off between searching for microlensing events in a survey whose footprint is much larger than that of typical microlensing searches, but with far-from-optimal time sampling. To examine the probability that microlensing events can be recovered in these data, we test statistics used on uniformly sampled data to identify variables and transients. We find that the von Neumann ratio performs best for identifying simulated microlensing events in our data. We develop a selection method using this statistic and apply it to data from fields with >10 R-band observations, 1.1 × 10{sup 9} light curves, uncovering three candidate microlensing events. We lack simultaneous, multi-color photometry to confirm these as microlensing events. However, their number is consistent with predictions for the event rate in the PTF footprint over the survey's three years of operations, as estimated from near-field microlensing models. This work can help constrain all-sky event rate predictions and tests microlensing signal recovery in large data sets, which will be useful to future time-domain surveys, such as that planned with the Large Synoptic Survey Telescope.

  2. Statistical Computing

    Indian Academy of Sciences (India)

    inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.

  3. Comparative statistical analysis of carcinogenic and non-carcinogenic effects of uranium in groundwater samples from different regions of Punjab, India

    International Nuclear Information System (INIS)

    Saini, Komal; Singh, Parminder; Bajwa, Bikramjit Singh

    2016-01-01

    LED flourimeter has been used for microanalysis of uranium concentration in groundwater samples collected from six districts of South West (SW), West (W) and North East (NE) Punjab, India. Average value of uranium content in water samples of SW Punjab is observed to be higher than WHO, USEPA recommended safe limit of 30 µg l −1 as well as AERB proposed limit of 60 µg l −1 . Whereas, for W and NE region of Punjab, average level of uranium concentration was within AERB recommended limit of 60 µg l −1 . Average value observed in SW Punjab is around 3–4 times the value observed in W Punjab, whereas its value is more than 17 times the average value observed in NE region of Punjab. Statistical analysis of carcinogenic as well as non carcinogenic risks due to uranium have been evaluated for each studied district. - Highlights: • Uranium level in groundwater samples have been assessed in different regions of Punjab. • Comparative study of carcinogenic and non carcinogenic effects of uranium has been done. • Wide variation has been found for different geological regions. • It has been found that South west Punjab is worst affected by uranium contamination in its water. • For west and north east regions of Punjab, uranium levels in groundwater laid under recommended safe limits.

  4. ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

    OpenAIRE

    Khan, Haseeb Ahmad

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for tra...

  5. ArraySolver: an algorithm for colour-coded graphical display and Wilcoxon signed-rank statistics for comparing microarray gene expression data.

    Science.gov (United States)

    Khan, Haseeb Ahmad

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.

  6. Building a stronger child dental health system in Australia: statistical sampling masks the burden of dental disease distribution in Australian children.

    Science.gov (United States)

    Tennant, M; Kruger, E

    2014-01-01

    In Australia, over the past 30 years, the prevalence of dental decay in children has reduced significantly, where today 60-70% of all 12-year-olds are caries free, and only 10% of children have more than two decayed teeth. However, many studies continue to report a small but significant subset of children suffering severe levels of decay. The present study applies Monte Carlo simulation to examine, at the national level, 12-year-old decayed, missing or filled teeth and shed light on both the statistical limitation of Australia's reporting to date as well as the problem of targeting high-risk children. A simulation for 273 000 Australian 12-year-old children found that moving from different levels of geographic clustering produced different statistical influences that drive different conclusions. At the high scale (ie state level) the gross averaging of the non-normally distributed disease burden masks the small subset of disease bearing children. At the much higher acuity of analysis (ie local government area) the risk of low numbers in the sample becomes a significant issue. The results clearly highlight the importance of care when examining the existing data, and, second, opportunities for far greater levels of targeting of services to children in need. The sustainability (and fairness) of universal coverage systems needs to be examined to ensure they remain highly targeted at disease burden, and not just focused on the children that are easy to reach (and suffer the least disease).

  7. Front-face fluorescence spectroscopy combined with second-order multivariate algorithms for the quantification of polyphenols in red wine samples.

    Science.gov (United States)

    Cabrera-Bañegil, Manuel; Hurtado-Sánchez, María Del Carmen; Galeano-Díaz, Teresa; Durán-Merás, Isabel

    2017-04-01

    The potential of front-face fluorescence spectroscopy combined with second-order chemometric methods was investigated for the quantification of the main polyphenols present in wine samples. Parallel factor analysis (PARAFAC) and unfolded-partial least squares coupled to residual bilinearization (U-PLS/RBL) were assessed for the quantification of catechin, epicatechin, quercetin, resveratrol, caffeic acid, gallic acid, p-coumaric acid, and vanillic acid in red wines. Excitation-emission matrices of different red wine samples, without pretreatment, were obtained in front-face mode, recording emission between 290 and 450 nm, exciting between 240 and 290 nm, for the analysis of epicatechin, catechin, caffeic acid, gallic acid, and vanillic acid; and excitation and emission between 300-360 and 330-400nm, respectively, for the analysis of resveratrol. U-PLS/RBL algorithm provided the best results and this methodology was validated by an optimized liquid chromatographic coupled to diode array and fluorimetric detectors procedure, obtaining a very good correlation for vanillic acid, caffeic acid, epicatechin and resveratrol. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Combination of artificial neural network and genetic algorithm method for modeling of methylene blue adsorption onto wood sawdust from water samples.

    Science.gov (United States)

    Khajeh, Mostafa; Sarafraz-Yazdi, Ali; Natavan, Zahra Bameri

    2016-03-01

    The aim of this research was to develop a low price and environmentally friendly adsorbent with abundant of source to remove methylene blue (MB) from water samples. Sawdust solid-phase extraction coupled with high-performance liquid chromatography was used for the extraction and determination of MB. In this study, an experimental data-based artificial neural network model is constructed to describe the performance of sawdust solid-phase extraction method for various operating conditions. The pH, time, amount of sawdust, and temperature were the input variables, while the percentage of extraction of MB was the output. The optimum operating condition was then determined by genetic algorithm method. The optimized conditions were obtained as follows: 11.5, 22.0 min, 0.3 g, and 26.0°C for pH of the solution, extraction time, amount of adsorbent, and temperature, respectively. Under these optimum conditions, the detection limit and relative standard deviation were 0.067 μg L(-1) and <2.4%, respectively. The Langmuir and Freundlich adsorption models were applied to describe the isotherm constant and for the removal and determination of MB from water samples. © The Author(s) 2013.

  9. Statistical inference

    CERN Document Server

    Rohatgi, Vijay K

    2003-01-01

    Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth

  10. Robust sampling-sourced numerical retrieval algorithm for optical energy loss function based on log–log mesh optimization and local monotonicity preserving Steffen spline

    Energy Technology Data Exchange (ETDEWEB)

    Maglevanny, I.I., E-mail: sianko@list.ru [Volgograd State Social Pedagogical University, 27 Lenin Avenue, Volgograd 400131 (Russian Federation); Smolar, V.A. [Volgograd State Technical University, 28 Lenin Avenue, Volgograd 400131 (Russian Federation)

    2016-01-15

    We introduce a new technique of interpolation of the energy-loss function (ELF) in solids sampled by empirical optical spectra. Finding appropriate interpolation methods for ELFs poses several challenges. The sampled ELFs are usually very heterogeneous, can originate from various sources thus so called “data gaps” can appear, and significant discontinuities and multiple high outliers can be present. As a result an interpolation based on those data may not perform well at predicting reasonable physical results. Reliable interpolation tools, suitable for ELF applications, should therefore satisfy several important demands: accuracy and predictive power, robustness and computational efficiency, and ease of use. We examined the effect on the fitting quality due to different interpolation schemes with emphasis on ELF mesh optimization procedures and we argue that the optimal fitting should be based on preliminary log–log scaling data transforms by which the non-uniformity of sampled data distribution may be considerably reduced. The transformed data are then interpolated by local monotonicity preserving Steffen spline. The result is a piece-wise smooth fitting curve with continuous first-order derivatives that passes through all data points without spurious oscillations. Local extrema can occur only at grid points where they are given by the data, but not in between two adjacent grid points. It is found that proposed technique gives the most accurate results and also that its computational time is short. Thus, it is feasible using this simple method to address practical problems associated with interaction between a bulk material and a moving electron. A compact C++ implementation of our algorithm is also presented.

  11. Robust sampling-sourced numerical retrieval algorithm for optical energy loss function based on log–log mesh optimization and local monotonicity preserving Steffen spline

    International Nuclear Information System (INIS)

    Maglevanny, I.I.; Smolar, V.A.

    2016-01-01

    We introduce a new technique of interpolation of the energy-loss function (ELF) in solids sampled by empirical optical spectra. Finding appropriate interpolation methods for ELFs poses several challenges. The sampled ELFs are usually very heterogeneous, can originate from various sources thus so called “data gaps” can appear, and significant discontinuities and multiple high outliers can be present. As a result an interpolation based on those data may not perform well at predicting reasonable physical results. Reliable interpolation tools, suitable for ELF applications, should therefore satisfy several important demands: accuracy and predictive power, robustness and computational efficiency, and ease of use. We examined the effect on the fitting quality due to different interpolation schemes with emphasis on ELF mesh optimization procedures and we argue that the optimal fitting should be based on preliminary log–log scaling data transforms by which the non-uniformity of sampled data distribution may be considerably reduced. The transformed data are then interpolated by local monotonicity preserving Steffen spline. The result is a piece-wise smooth fitting curve with continuous first-order derivatives that passes through all data points without spurious oscillations. Local extrema can occur only at grid points where they are given by the data, but not in between two adjacent grid points. It is found that proposed technique gives the most accurate results and also that its computational time is short. Thus, it is feasible using this simple method to address practical problems associated with interaction between a bulk material and a moving electron. A compact C++ implementation of our algorithm is also presented.

  12. The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

    Science.gov (United States)

    Marković, Snežana; Kerč, Janez; Horvat, Matej

    2017-03-01

    We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.

  13. Robust statistical methods with R

    CERN Document Server

    Jureckova, Jana

    2005-01-01

    Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...

  14. Evaluation Of Algorithms Of Anti- HIV Antibody Tests

    Directory of Open Access Journals (Sweden)

    Paranjape R.S

    1997-01-01

    Full Text Available Research question: Can alternate algorithms be used in place of conventional algorithm for epidemiological studies of HIV infection with less expenses? Objective: To compare the results of HIV sero- prevalence as determined by test algorithms combining three kits with conventional test algorithm. Study design: Cross â€" sectional. Participants: 282 truck drivers. Statistical analysis: Sensitivity and specificity analysis and predictive values. Results: Three different algorithms that do not include Western Blot (WB were compared with the conventional algorithm, in a truck driver population with 5.6% prevalence of HIV â€"I infection. Algorithms with one EIA (Genetic Systems or Biotest and a rapid test (immunocomb or with two EIAs showed 100% positive predictive value in relation to the conventional algorithm. Using an algorithm with EIA as screening test and a rapid test as a confirmatory test was 50 to 70% less expensive than the conventional algorithm per positive scrum sample. These algorithms obviate the interpretation of indeterminate results and also give differential diagnosis of HIV-2 infection. Alternate algorithms are ideally suited for community based control programme in developing countries. Application of these algorithms in population with low prevalence should also be studied in order to evaluate universal applicability.

  15. A quantitative structure–activity relationship study on HIV-1 integrase inhibitors using genetic algorithm, artificial neural networks and different statistical methods

    Directory of Open Access Journals (Sweden)

    Ghasem Ghasemi

    2016-09-01

    Full Text Available In this work, quantitative structure–activity relationship (QSAR study has been done on tricyclic phthalimide analogues acting as HIV-1 integrase inhibitors. Forty compounds were used in this study. Genetic algorithm (GA, artificial neural network (ANN and multiple linear regressions (MLR were utilized to construct the non-linear and linear QSAR models. It revealed that the GA–ANN model was much better than other models. For this purpose, ab initio geometry optimization performed at B3LYP level with a known basis set 6–31G (d. Hyperchem, ChemOffice and Gaussian 98W softwares were used for geometry optimization of the molecules and calculation of the quantum chemical descriptors. To include some of the correlation energy, the calculation was done with the density functional theory (DFT with the same basis set and Becke’s three parameter hybrid functional using the LYP correlation functional (B3LYP/6–31G (d. For the calculations in solution phase, the polarized continuum model (PCM was used and also included optimizations at gas-phase B3LYP/6–31G (d level for comparison. In the aqueous phase, the root–mean–square errors of the training set and the test set for GA–ANN model using jack–knife method, were 0.1409, 0.1804, respectively. In the gas phase, the root–mean–square errors of the training set and the test set for GA–ANN model were 0.1408, 0.3103, respectively. Also, the R2 values in the aqueous and the gas phase were obtained as 0.91, 0.82, respectively.

  16. Direct Learning of Systematics-Aware Summary Statistics

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...

  17. How to Make Nothing Out of Something: Analyses of the Impact of Study Sampling and Statistical Interpretation in Misleading Meta-Analytic Conclusions

    Directory of Open Access Journals (Sweden)

    Michael Robert Cunningham

    2016-10-01

    Full Text Available The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger, Wood, Stiff, and Chatzisarantis, 2010. Meta-analyses are supposed to reduce bias in literature reviews. Carter, Kofler, Forster, and McCullough’s (2015 meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and funnel plot asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test [PET] and Precision Effect Estimate with Standard Error (PEESE procedures. Despite these serious problems, the Carter et al. meta-analysis results actually indicate that there is a real depletion effect – contrary to their title.

  18. Statistical analyses of in-situ and soil-sample measurements for radionuclides in surface soil near the 116-K-2 trench

    International Nuclear Information System (INIS)

    Gilbert, R.O.; Klover, W.J.

    1988-09-01

    Radiation detection surveys are used at the US Department of Energy's Hanford Reservation near Richland, Washington, to determine areas that need posting as radiation zones or to measure dose rates in the field. The relationship between measurements made by Sodium Iodide (NaI) detectors mounted on the mobile Road Monitor vehicle and those made by hand-held GM P-11 probes and Micro-R meters are of particular interest because the Road Monitor can survey land areas in much less time than hand-held detectors. Statistical regression methods are used here to develop simple equations to predict GM P-11 probe gross gamma count-per-minute (cpm) and Micro-R-Meter μR/h measurements on the basis of NaI gross gamma count-per-second (cps) measurements obtained using the Road Monitor. These equations were estimated using data collected near the 116-K-2 Trench in the 100-K area on the Hanford Reservation. Equations are also obtained for estimating upper and lower limits within which the GM P-11 or Micro-R-Meter measurement corresponding to a given NaI Road Monitor measurement at a new location is expected to fall with high probability. An equation and limits for predicting GM P-11 measurements on the basis of Micro-R- Meter measurements is also estimated. Also, we estimate an equation that may be useful for approximating the 90 Sr measurement of a surface soil sample on the basis of a spectroscopy measurement for 137 Cs on that sample. 3 refs., 16 figs., 44 tabs

  19. The Autism Diagnostic Observation Schedule, Module 4: Application of the Revised Algorithms in an Independent, Well-Defined, Dutch Sample (N = 93)

    Science.gov (United States)

    de Bildt, Annelies; Sytema, Sjoerd; Meffert, Harma; Bastiaansen, Jojanneke A. C. J.

    2016-01-01

    This study examined the discriminative ability of the revised Autism Diagnostic Observation Schedule module 4 algorithm (Hus and Lord in "J Autism Dev Disord" 44(8):1996-2012, 2014) in 93 Dutch males with Autism Spectrum Disorder (ASD), schizophrenia, psychopathy or controls. Discriminative ability of the revised algorithm ASD cut-off…

  20. New Optimization Algorithms in Physics

    CERN Document Server

    Hartmann, Alexander K

    2004-01-01

    Many physicists are not aware of the fact that they can solve their problems by applying optimization algorithms. Since the number of such algorithms is steadily increasing, many new algorithms have not been presented comprehensively until now. This presentation of recently developed algorithms applied in physics, including demonstrations of how they work and related results, aims to encourage their application, and as such the algorithms selected cover concepts and methods from statistical physics to optimization problems emerging in theoretical computer science.