WorldWideScience

Sample records for model predictions statistical

  1. A statistical model for predicting muscle performance

    Science.gov (United States)

    Byerly, Diane Leslie De Caix

    The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing

  2. Risk prediction model: Statistical and artificial neural network approach

    Science.gov (United States)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  3. Which method predicts recidivism best?: A comparison of statistical, machine learning, and data mining predictive models

    OpenAIRE

    Tollenaar, N.; van der Heijden, P.G.M.

    2012-01-01

    Using criminal population conviction histories of recent offenders, prediction mod els are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining and machine learning provide an improvement in predictive performance over classical statistical methods, namely logistic regression and linear discrim inant analysis. These models are compared ...

  4. Model output statistics applied to wind power prediction

    Energy Technology Data Exchange (ETDEWEB)

    Joensen, A.; Giebel, G.; Landberg, L. [Risoe National Lab., Roskilde (Denmark); Madsen, H.; Nielsen, H.A. [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)

    1999-03-01

    Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.

  5. Statistical models for expert judgement and wear prediction

    International Nuclear Information System (INIS)

    Pulkkinen, U.

    1994-01-01

    This thesis studies the statistical analysis of expert judgements and prediction of wear. The point of view adopted is the one of information theory and Bayesian statistics. A general Bayesian framework for analyzing both the expert judgements and wear prediction is presented. Information theoretic interpretations are given for some averaging techniques used in the determination of consensus distributions. Further, information theoretic models are compared with a Bayesian model. The general Bayesian framework is then applied in analyzing expert judgements based on ordinal comparisons. In this context, the value of information lost in the ordinal comparison process is analyzed by applying decision theoretic concepts. As a generalization of the Bayesian framework, stochastic filtering models for wear prediction are formulated. These models utilize the information from condition monitoring measurements in updating the residual life distribution of mechanical components. Finally, the application of stochastic control models in optimizing operational strategies for inspected components are studied. Monte-Carlo simulation methods, such as the Gibbs sampler and the stochastic quasi-gradient method, are applied in the determination of posterior distributions and in the solution of stochastic optimization problems. (orig.) (57 refs., 7 figs., 1 tab.)

  6. Estimating Predictive Variance for Statistical Gas Distribution Modelling

    International Nuclear Information System (INIS)

    Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo

    2009-01-01

    Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.

  7. Statistical model based gender prediction for targeted NGS clinical panels

    Directory of Open Access Journals (Sweden)

    Palani Kannan Kandavel

    2017-12-01

    The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.

  8. Monthly to seasonal low flow prediction: statistical versus dynamical models

    Science.gov (United States)

    Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke

    2016-04-01

    the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.

  9. Error Estimation of An Ensemble Statistical Seasonal Precipitation Prediction Model

    Science.gov (United States)

    Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Gui-Long

    2001-01-01

    This NASA Technical Memorandum describes an optimal ensemble canonical correlation forecasting model for seasonal precipitation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. Since new CCA scheme is derived for continuous fields of predictor and predictand, an area-factor is automatically included. Thus our model is an improvement of the spectral CCA scheme of Barnett and Preisendorfer. The improvements include (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States (US) precipitation field. The predictor is the sea surface temperature (SST). The US Climate Prediction Center's reconstructed SST is used as the predictor's historical data. The US National Center for Environmental Prediction's optimally interpolated precipitation (1951-2000) is used as the predictand's historical data. Our forecast experiments show that the new ensemble canonical correlation scheme renders a reasonable forecasting skill. For example, when using September-October-November SST to predict the next season December-January-February precipitation, the spatial pattern correlation between the observed and predicted are positive in 46 years among the 50 years of experiments. The positive correlations are close to or greater than 0.4 in 29 years, which indicates excellent performance of the forecasting model. The forecasting skill can be further enhanced when several predictors are used.

  10. Webinar of paper 2013, Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models

    NARCIS (Netherlands)

    Tollenaar, N.; Van der Heijden, P.G.M.

    2013-01-01

    Using criminal population criminal conviction history information, prediction models are developed that predict three types of criminal recidivism: general recidivism, violent recidivism and sexual recidivism. The research question is whether prediction techniques from modern statistics, data mining

  11. A two-component rain model for the prediction of attenuation statistics

    Science.gov (United States)

    Crane, R. K.

    1982-01-01

    A two-component rain model has been developed for calculating attenuation statistics. In contrast to most other attenuation prediction models, the two-component model calculates the occurrence probability for volume cells or debris attenuation events. The model performed significantly better than the International Radio Consultative Committee model when used for predictions on earth-satellite paths. It is expected that the model will have applications in modeling the joint statistics required for space diversity system design, the statistics of interference due to rain scatter at attenuating frequencies, and the duration statistics for attenuation events.

  12. A Statistical Cyclone Intensity Prediction (SCIP) model for the Bay of ...

    Indian Academy of Sciences (India)

    been proposed. The model is developed applying multiple linear regression technique. The model ... Keywords. Tropical cyclone; intensity prediction; multiple linear regression; regression coefficient; statistical model. J. Earth Syst. Sci. 117, No. ... posed a simple empirical model for predicting the intensity of tropical cyclones ...

  13. Modelling prediction of unemployment statistics using web technologies

    Directory of Open Access Journals (Sweden)

    Popescu Mioara

    2017-12-01

    Full Text Available The global diffusion of Internet involves economic, political and demographic factors that can predict in real time. In this article, we demonstrate that according to data provided by EUROSTAT, the number of people looking for a job in Romania it is correlated with specific query terms using Google Trends. Search engine data is used to “predict the present” values of different economic indicators. The obtained results are compared with the classical method of developing the economic indicators, with official EUROSTAT employment data. In this paper, we demonstrate that the new methods to extract the economic indicators from web technologies are accurate.

  14. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    Science.gov (United States)

    2015-09-30

    information on fish school distributions by monitoring the direction of birds returning to the colony or the behavior of other birds at sea through...active sonar. Toward this goal, fundamental advances in the understanding of fish behavior , especially in aggregations, will be made under conditions...relevant to the echo statistics problem. OBJECTIVES To develop new models of behavior of fish aggregations, including the fission/fusion process

  15. Prediction of lacking control power in power plants using statistical models

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Mataji, B.; Stoustrup, Jakob

    2007-01-01

    Prediction of the performance of plants like power plants is of interest, since the plant operator can use these predictions to optimize the plant production. In this paper the focus is addressed on a special case where a combination of high coal moisture content and a high load limits the possible...... errors; the second uses operating point depending statistics of prediction errors. Using these methods on the previous mentioned case, it can be concluded that the second method can be used to predict the power plant performance, while the first method has problems predicting the uncertain performance...... plant load, meaning that the requested plant load cannot be met. The available models are in this case uncertain. Instead statistical methods are used to predict upper and lower uncertainty bounds on the prediction. Two different methods are used. The first relies on statistics of recent prediction...

  16. OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

    Energy Technology Data Exchange (ETDEWEB)

    Matthew J. Tonkin; Claire R. Tiedeman; D. Matthew Ely; and Mary C. Hill

    2007-08-16

    The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one

  17. OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

    Science.gov (United States)

    Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

    2007-01-01

    The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one

  18. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    International Nuclear Information System (INIS)

    Zaichik, Leonid I; Alipchenkov, Vladimir M

    2009-01-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87; 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  19. Sources of nonlinear behavior and Predictability in a realistic atmospheric model: a data modeling statistical approach

    Science.gov (United States)

    Peters, J. M.; Kravtsov, S.

    2011-12-01

    This study quantifies the dependence of nonlinear regimes (manifested in non-gaussian probability distributions) and spreads of ensemble trajectories in a reduced phase space of a realistic three-layer quasi-geostrophic (QG3) atmospheric model on this model's climate state.To elucidate probabilistic properties of the QG3 trajectories, we compute, in phase planes of leading EOFs of the model, the coefficients of the corresponding Fokker-Planck (FP) equations. These coefficients represent drift vectors (computed from one-day phase space tendencies) and diffusion tensors (computed from one-day lagged covariance matrices of model trajectory displacements), and are based on a long QG3 simulation. We also fit two statistical trajectory models to the reduced phase-space time series spanned by the full QG3 model states. One reduced model is a standard Linear Inverse Model (LIM) fitted to a long QG3 time series. The LIM model is forced by state-independent (additive) noise and has a deterministic operator which represents non-divergent velocity field in the reduced phase space considered. The other, more advanced model (NSM), is nonlinear, divergent, and is driven by state-dependent noise. The NSM model mimics well the full QG3 model trajectory behavior in the reduced phase space; its corresponding FP model is nearly identical to that based on the full QG3 simulations. By systematic analysis of the differences between the drift vectors and diffusion tensors of the QG3-based, NSM-based, and LIM-based FP models, as well as the PDF evolution simulated by these FP models, we disentangle the contributions of the multiplicative noise and deterministic dynamics into nonlinear behavior and predictability of the atmospheric states produced by the dynamical QG3 model.

  20. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    Science.gov (United States)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with

  1. Output from Statistical Predictive Models as Input to eLearning Dashboards

    Directory of Open Access Journals (Sweden)

    Marlene A. Smith

    2015-06-01

    Full Text Available We describe how statistical predictive models might play an expanded role in educational analytics by giving students automated, real-time information about what their current performance means for eventual success in eLearning environments. We discuss how an online messaging system might tailor information to individual students using predictive analytics. The proposed system would be data-driven and quantitative; e.g., a message might furnish the probability that a student will successfully complete the certificate requirements of a massive open online course. Repeated messages would prod underperforming students and alert instructors to those in need of intervention. Administrators responsible for accreditation or outcomes assessment would have ready documentation of learning outcomes and actions taken to address unsatisfactory student performance. The article’s brief introduction to statistical predictive models sets the stage for a description of the messaging system. Resources and methods needed to develop and implement the system are discussed.

  2. Improving Prediction Skill of Imperfect Turbulent Models Through Statistical Response and Information Theory

    Science.gov (United States)

    Majda, Andrew J.; Qi, Di

    2016-02-01

    Turbulent dynamical systems with a large phase space and a high degree of instabilities are ubiquitous in climate science and engineering applications. Statistical uncertainty quantification (UQ) to the response to the change in forcing or uncertain initial data in such complex turbulent systems requires the use of imperfect models due to the lack of both physical understanding and the overwhelming computational demands of Monte Carlo simulation with a large-dimensional phase space. Thus, the systematic development of reduced low-order imperfect statistical models for UQ in turbulent dynamical systems is a grand challenge. This paper applies a recent mathematical strategy for calibrating imperfect models in a training phase and accurately predicting the response by combining information theory and linear statistical response theory in a systematic fashion. A systematic hierarchy of simple statistical imperfect closure schemes for UQ for these problems is designed and tested which are built through new local and global statistical energy conservation principles combined with statistical equilibrium fidelity. The forty mode Lorenz 96 (L-96) model which mimics forced baroclinic turbulence is utilized as a test bed for the calibration and predicting phases for the hierarchy of computationally cheap imperfect closure models both in the full phase space and in a reduced three-dimensional subspace containing the most energetic modes. In all of phase spaces, the nonlinear response of the true model is captured accurately for the mean and variance by the systematic closure model, while alternative methods based on the fluctuation-dissipation theorem alone are much less accurate. For reduced-order model for UQ in the three-dimensional subspace for L-96, the systematic low-order imperfect closure models coupled with the training strategy provide the highest predictive skill over other existing methods for general forced response yet have simple design principles based on a

  3. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    International Nuclear Information System (INIS)

    Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t

    2012-01-01

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  4. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    Energy Technology Data Exchange (ETDEWEB)

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)

    2012-03-15

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  5. Statistical external validation and consensus modeling: a QSPR case study for Koc prediction.

    Science.gov (United States)

    Gramatica, Paola; Giani, Elisa; Papa, Ester

    2007-03-01

    The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

  6. Statistical Model Predictions for p+p and Pb+Pb Collisions at LHC

    CERN Document Server

    Kraus, I; Oeschler, H; Redlich, K; Wheaton, S

    2009-01-01

    Particle production in p+p and central Pb+Pb collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in detail, and some of them, which are particularly appropriate to determine the chemical freeze-out point experimentally, are indicated. Considering elementary interactions on the other hand, we focus on strangeness production and its possible suppression. Extrapolating the thermal parameters to LHC energy, we present predictions of the statistical model for particle yields in p+p collisions. We quantify the strangeness suppression by the correlation volume parameter and discuss its influence on particle production. We propose observables that can provide deeper insight into the mechanism of strangeness production and suppression at LHC.

  7. Statistical equivalence of prediction models of the soil sorption coefficient obtained using different log P algorithms.

    Science.gov (United States)

    Olguin, Carlos José Maria; Sampaio, Silvio César; Dos Reis, Ralpho Rinaldo

    2017-10-01

    The soil sorption coefficient normalized to the organic carbon content (K oc ) is a physicochemical parameter used in environmental risk assessments and in determining the final fate of chemicals released into the environment. Several models for predicting this parameter have been proposed based on the relationship between log K oc and log P. The difficulty and cost of obtaining experimental log P values led to the development of algorithms to calculate these values, some of which are free to use. However, quantitative structure-property relationship (QSPR) studies did not detail how or why a particular algorithm was chosen. In this study, we evaluated several free algorithms for calculating log P in the modeling of log K oc , using a broad and diverse set of compounds (n = 639) that included several chemical classes. In addition, we propose the adoption of a simple test to verify if there is statistical equivalence between models obtained using different data sets. Our results showed that the ALOGPs, KOWWIN and XLOGP3 algorithms generated the best models for modeling K oc , and these models are statistically equivalent. This finding shows that it is possible to use the different algorithms without compromising statistical quality and predictive capacity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Water quality management using statistical analysis and time-series prediction model

    Science.gov (United States)

    Parmar, Kulwinder Singh; Bhardwaj, Rashmi

    2014-12-01

    This paper deals with water quality management using statistical analysis and time-series prediction model. The monthly variation of water quality standards has been used to compare statistical mean, median, mode, standard deviation, kurtosis, skewness, coefficient of variation at Yamuna River. Model validated using R-squared, root mean square error, mean absolute percentage error, maximum absolute percentage error, mean absolute error, maximum absolute error, normalized Bayesian information criterion, Ljung-Box analysis, predicted value and confidence limits. Using auto regressive integrated moving average model, future water quality parameters values have been estimated. It is observed that predictive model is useful at 95 % confidence limits and curve is platykurtic for potential of hydrogen (pH), free ammonia, total Kjeldahl nitrogen, dissolved oxygen, water temperature (WT); leptokurtic for chemical oxygen demand, biochemical oxygen demand. Also, it is observed that predicted series is close to the original series which provides a perfect fit. All parameters except pH and WT cross the prescribed limits of the World Health Organization /United States Environmental Protection Agency, and thus water is not fit for drinking, agriculture and industrial use.

  9. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  10. Statistical Model Selection for Better Prediction and Discovering Science Mechanisms That Affect Reliability

    Directory of Open Access Journals (Sweden)

    Christine M. Anderson-Cook

    2015-08-01

    Full Text Available Understanding the impact of production, environmental exposure and age characteristics on the reliability of a population is frequently based on underlying science and empirical assessment. When there is incomplete science to prescribe which inputs should be included in a model of reliability to predict future trends, statistical model/variable selection techniques can be leveraged on a stockpile or population of units to improve reliability predictions as well as suggest new mechanisms affecting reliability to explore. We describe a five-step process for exploring relationships between available summaries of age, usage and environmental exposure and reliability. The process involves first identifying potential candidate inputs, then second organizing data for the analysis. Third, a variety of models with different combinations of the inputs are estimated, and fourth, flexible metrics are used to compare them. Finally, plots of the predicted relationships are examined to distill leading model contenders into a prioritized list for subject matter experts to understand and compare. The complexity of the model, quality of prediction and cost of future data collection are all factors to be considered by the subject matter experts when selecting a final model.

  11. Spatial statistical modeling of shallow landslides—Validating predictions for different landslide inventories and rainfall events

    Science.gov (United States)

    von Ruette, Jonas; Papritz, Andreas; Lehmann, Peter; Rickli, Christian; Or, Dani

    2011-10-01

    Statistical models that exploit the correlation between landslide occurrence and geomorphic properties are often used to map the spatial occurrence of shallow landslides triggered by heavy rainfalls. In many landslide susceptibility studies, the true predictive power of the statistical model remains unknown because the predictions are not validated with independent data from other events or areas. This study validates statistical susceptibility predictions with independent test data. The spatial incidence of landslides, triggered by an extreme rainfall in a study area, was modeled by logistic regression. The fitted model was then used to generate susceptibility maps for another three study areas, for which event-based landslide inventories were also available. All the study areas lie in the northern foothills of the Swiss Alps. The landslides had been triggered by heavy rainfall either in 2002 or 2005. The validation was designed such that the first validation study area shared the geomorphology and the second the triggering rainfall event with the calibration study area. For the third validation study area, both geomorphology and rainfall were different. All explanatory variables were extracted for the logistic regression analysis from high-resolution digital elevation and surface models (2.5 m grid). The model fitted to the calibration data comprised four explanatory variables: (i) slope angle (effect of gravitational driving forces), (ii) vegetation type (grassland and forest; root reinforcement), (iii) planform curvature (convergent water flow paths), and (iv) contributing area (potential supply of water). The area under the Receiver Operating Characteristic (ROC) curve ( AUC) was used to quantify the predictive performance of the logistic regression model. The AUC values were computed for the susceptibility maps of the three validation study areas (validation AUC), the fitted susceptibility map of the calibration study area (apparent AUC: 0.80) and another

  12. Predictive Model for the Design of Zwitterionic Polymer Brushes: A Statistical Design of Experiments Approach.

    Science.gov (United States)

    Kumar, Ramya; Lahann, Joerg

    2016-07-06

    The performance of polymer interfaces in biology is governed by a wide spectrum of interfacial properties. With the ultimate goal of identifying design parameters for stem cell culture coatings, we developed a statistical model that describes the dependence of brush properties on surface-initiated polymerization (SIP) parameters. Employing a design of experiments (DOE) approach, we identified operating boundaries within which four gel architecture regimes can be realized, including a new regime of associated brushes in thin films. Our statistical model can accurately predict the brush thickness and the degree of intermolecular association of poly[{2-(methacryloyloxy) ethyl} dimethyl-(3-sulfopropyl) ammonium hydroxide] (PMEDSAH), a previously reported synthetic substrate for feeder-free and xeno-free culture of human embryonic stem cells. DOE-based multifunctional predictions offer a powerful quantitative framework for designing polymer interfaces. For example, model predictions can be used to decrease the critical thickness at which the wettability transition occurs by simply increasing the catalyst quantity from 1 to 3 mol %.

  13. Reproducing tailing in breakthrough curves: Are statistical models equally representative and predictive?

    Science.gov (United States)

    Pedretti, Daniele; Bianchi, Marco

    2018-03-01

    Breakthrough curves (BTCs) observed during tracer tests in highly heterogeneous aquifers display strong tailing. Power laws are popular models for both the empirical fitting of these curves, and the prediction of transport using upscaling models based on best-fitted estimated parameters (e.g. the power law slope or exponent). The predictive capacity of power law based upscaling models can be however questioned due to the difficulties to link model parameters with the aquifers' physical properties. This work analyzes two aspects that can limit the use of power laws as effective predictive tools: (a) the implication of statistical subsampling, which often renders power laws undistinguishable from other heavily tailed distributions, such as the logarithmic (LOG); (b) the difficulties to reconcile fitting parameters obtained from models with different formulations, such as the presence of a late-time cutoff in the power law model. Two rigorous and systematic stochastic analyses, one based on benchmark distributions and the other on BTCs obtained from transport simulations, are considered. It is found that a power law model without cutoff (PL) results in best-fitted exponents (αPL) falling in the range of typical experimental values reported in the literature (1.5 constant αCO ≈ 1. In the PLCO model, the cutoff rate (λ) is the parameter that fully reproduces the persistence of the tailing and is shown to be inversely correlated to the LOG scale parameter (i.e. with the skewness of the distribution). The theoretical results are consistent with the fitting analysis of a tracer test performed during the MADE-5 experiment. It is shown that a simple mechanistic upscaling model based on the PLCO formulation is able to predict the ensemble of BTCs from the stochastic transport simulations without the need of any fitted parameters. The model embeds the constant αCO = 1 and relies on a stratified description of the transport mechanisms to estimate λ. The PL fails to

  14. Flow prediction models using macroclimatic variables and multivariate statistical techniques in the Cauca River Valley

    International Nuclear Information System (INIS)

    Carvajal Escobar Yesid; Munoz, Flor Matilde

    2007-01-01

    The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.

  15. Development of a Statistical Model for Seasonal Prediction of North Atlantic Hurricane Numbers

    Science.gov (United States)

    Davis, K.; Zeng, X.

    2014-12-01

    Tropical cyclones cause more financial distress to insurance companies than any other natural disaster. From 1970-2002, it is estimated that hurricanes caused 44 billion dollars in damage, greater than 2.5 times the the next costliest catastrophe. Theses damages do not go without effect. A string of major catastrophes from 1991-1994 caused nine property firms to bankrupt and caused serious financial strain on others. The public was not only affected by the loss of life and property, but the increase in tax dollars for disaster relief. Providing better seasonal predictions of North Atlantic hurricane activity farther in advance will help alleviate some of the financial strains these major catastrophes put on the nation. A statistical model was first developed by Bill Gray's team to predict the total number of hurricanes over the North Atlantic in 1984, followed by other statistical methods, dynamic modeling, and hybrid methods in recent years. However, all these methods showed little to no skill with forecasts made by June 1 in recent years. In contrast to the relatively small year-to-year change in seasonal hurricane numbers pre-1980, there has been much greater interannual changes since, especially since the year 2000. For instance, while there were very high hurricane numbers in 2005 and 2010, 2013 was one of the lowest in history. Recognizing these interdecadal changes in the dispersion of hurricane numbers, we have developed a new statistical model to more realistically predict (by June 1 each year) the seasonal hurricane number over the North Atlantic. It is based on the Multivariate ENSO Index (MEI) conditioned by the Atlantic Multidecadal Oscillation (AMO) index, the zonal wind stress and sea surface temperature over the Atlantic. It provides both the deterministic number and the range of hurricane numbers. The details of the model and its performance from 1950-2014 in comparison with other methods will be presented in our presentation.

  16. A statistical prediction model based on sparse representations for single image super-resolution.

    Science.gov (United States)

    Peleg, Tomer; Elad, Michael

    2014-06-01

    We address single image super-resolution using a statistical prediction model based on sparse representations of low- and high-resolution image patches. The suggested model allows us to avoid any invariance assumption, which is a common practice in sparsity-based approaches treating this task. Prediction of high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual appearance. The suggested approach offers a desirable compromise between low computational complexity and reconstruction quality, when comparing it with state-of-the-art methods for single image super-resolution.

  17. Wind gust estimation by combining numerical weather prediction model and statistical post-processing

    Science.gov (United States)

    Patlakas, Platon; Drakaki, Eleni; Galanis, George; Spyrou, Christos; Kallos, George

    2017-04-01

    The continuous rise of off-shore and near-shore activities as well as the development of structures, such as wind farms and various offshore platforms, requires the employment of state-of-the-art risk assessment techniques. Such analysis is used to set the safety standards and can be characterized as a climatologically oriented approach. Nevertheless, a reliable operational support is also needed in order to minimize cost drawbacks and human danger during the construction and the functioning stage as well as during maintenance activities. One of the most important parameters for this kind of analysis is the wind speed intensity and variability. A critical measure associated with this variability is the presence and magnitude of wind gusts as estimated in the reference level of 10m. The latter can be attributed to different processes that vary among boundary-layer turbulence, convection activities, mountain waves and wake phenomena. The purpose of this work is the development of a wind gust forecasting methodology combining a Numerical Weather Prediction model and a dynamical statistical tool based on Kalman filtering. To this end, the parameterization of Wind Gust Estimate method was implemented to function within the framework of the atmospheric model SKIRON/Dust. The new modeling tool combines the atmospheric model with a statistical local adaptation methodology based on Kalman filters. This has been tested over the offshore west coastline of the United States. The main purpose is to provide a useful tool for wind analysis and prediction and applications related to offshore wind energy (power prediction, operation and maintenance). The results have been evaluated by using observational data from the NOAA's buoy network. As it was found, the predicted output shows a good behavior that is further improved after the local adjustment post-process.

  18. Research Pearls: The Significance of Statistics and Perils of Pooling. Part 2: Predictive Modeling.

    Science.gov (United States)

    Hohmann, Erik; Wetzler, Merrick J; D'Agostino, Ralph B

    2017-07-01

    The focus of predictive modeling or predictive analytics is to use statistical techniques to predict outcomes and/or the results of an intervention or observation for patients that are conditional on a specific set of measurements taken on the patients prior to the outcomes occurring. Statistical methods to estimate these models include using such techniques as Bayesian methods; data mining methods, such as machine learning; and classical statistical models of regression such as logistic (for binary outcomes), linear (for continuous outcomes), and survival (Cox proportional hazards) for time-to-event outcomes. A Bayesian approach incorporates a prior estimate that the outcome of interest is true, which is made prior to data collection, and then this prior probability is updated to reflect the information provided by the data. In principle, data mining uses specific algorithms to identify patterns in data sets and allows a researcher to make predictions about outcomes. Regression models describe the relations between 2 or more variables where the primary difference among methods concerns the form of the outcome variable, whether it is measured as a binary variable (i.e., success/failure), continuous measure (i.e., pain score at 6 months postop), or time to event (i.e., time to surgical revision). The outcome variable is the variable of interest, and the predictor variable(s) are used to predict outcomes. The predictor variable is also referred to as the independent variable and is assumed to be something the researcher can modify in order to see its impact on the outcome (i.e., using one of several possible surgical approaches). Survival analysis investigates the time until an event occurs. This can be an event such as failure of a medical device or death. It allows the inclusion of censored data, meaning that not all patients need to have the event (i.e., die) prior to the study's completion. Copyright © 2017 Arthroscopy Association of North America. Published by

  19. Development of statistical prediction models for Changma precipitation: An ensemble approach

    Science.gov (United States)

    Kim, Jin-Yong; Seo, Kyong-Hwan; Son, Jun-Hyeok; Ha, Kyung-Ja

    2017-05-01

    An ensemble statistical forecast scheme with a one-month lead is developed to predict year-to-year variations of Changma rainfall over the Korean peninsula. Spring sea surface temperature (SST) anomalies over the North Atlantic, the North Pacific and the tropical Pacific Ocean have been proposed as useful predictors in a previous study. Through a forward-stepwise regression method, four additional springtime predictors are selected: the northern Indian Ocean (NIO) SST, the North Atlantic SST change (NAC), the snow cover anomaly over the Eurasian continent (EUSC), and the western North Pacific outgoing longwave radiation anomaly (WNP (OLR)). Using these, three new prediction models are developed. A simple arithmetic ensemble mean produces much improved forecast skills compared to the original prediction model of Lee and Seo (2013). Skill scores measured by temporal correlation and MSSS (mean square error skill score) are improved by about 9% and 17%, respectively. The GMSS (Gerrity skill score) and hit rate based on a tercile prediction validation scheme are also enhanced by about 19% and 13%, respectively. The reversed NIO, reversed WNP (OLR), and reversed NAC are all related to the enhancement of a cyclonic circulation anomaly to the south or southwest of the Korean peninsula, which induces southeasterly moisture flux into the peninsula and increasing Changma precipitation. The EUSC predictor induces an enhancement of the Okhotsk Sea high downstream and thus strengthening of Changma front.

  20. Prediction of climate change in Brunei Darussalam using statistical downscaling model

    Science.gov (United States)

    Hasan, Dk. Siti Nurul Ain binti Pg. Ali; Ratnayake, Uditha; Shams, Shahriar; Nayan, Zuliana Binti Hj; Rahman, Ena Kartina Abdul

    2017-06-01

    Climate is changing and evidence suggests that the impact of climate change would influence our everyday lives, including agriculture, built environment, energy management, food security and water resources. Brunei Darussalam located within the heart of Borneo will be affected both in terms of precipitation and temperature. Therefore, it is crucial to comprehend and assess how important climate indicators like temperature and precipitation are expected to vary in the future in order to minimise its impact. This study assesses the application of a statistical downscaling model (SDSM) for downscaling General Circulation Model (GCM) results for maximum and minimum temperatures along with precipitation in Brunei Darussalam. It investigates future climate changes based on numerous scenarios using Hadley Centre Coupled Model, version 3 (HadCM3), Canadian Earth System Model (CanESM2) and third-generation Coupled Global Climate Model (CGCM3) outputs. The SDSM outputs were improved with the implementation of bias correction and also using a monthly sub-model instead of an annual sub-model. The outcomes of this assessment show that monthly sub-model performed better than the annual sub-model. This study indicates a satisfactory applicability for generation of maximum temperatures, minimum temperatures and precipitation for future periods of 2017-2046 and 2047-2076. All considered models and the scenarios were consistent in predicting increasing trend of maximum temperature, increasing trend of minimum temperature and decreasing trend of precipitations. Maximum overall trend of Tmax was also observed for CanESM2 with Representative Concentration Pathways (RCP) 8.5 scenario. The increasing trend is 0.014 °C per year. Accordingly, by 2076, the highest prediction of average maximum temperatures is that it will increase by 1.4 °C. The same model predicts an increasing trend of Tmin of 0.004 °C per year, while the highest trend is seen under CGCM3-A2 scenario which is 0.009

  1. Predicting contaminant concentration at a pumping well with method combining statistical and mathematical models

    Science.gov (United States)

    Lim, J.; Bae, G.; Kaown, D.; Lee, K.

    2008-12-01

    In predicting contamination of groundwater, both the statistical and mathematical modeling methods are used in combination with one another with the aim of taking the advantage of each method. With a mathematical model based on the backward transport equation, probabilistic capture zones of pumping wells are delineated. And these capture zones are used as a buffer zone of each pumping well for a statistical regression model. Tobit regression model is used to investigate the influence of land use on contaminant concentration at a pumping well. Using probabilistic capture zones as buffer zones instead of circular zones, flow and transport regime near pumping wells can be considered in the regression model as well as different types of land use. The method is applied to a small agricultural basin in Chuncheon, Korea which is occupied by vegetation fields, orchards and small barns. Accordingly, chemical fertilizers and manures are frequently applied on the land surface for agricultural purposes. Area of land use type - vegetation fields, orchards, and small barns - within the probabilistic capture zones, land slope, and elevation are used as explanatory variables. As dependent variable, nitrate concentrations observed at pumping wells are used. The proposed method gives better prediction of nitrate concentration than the general regression model using circular buffer zones does. Also, it is expected that the proposed method can be effectively used to relate the loading mass of fertilizer and its concentration in ground water at pumping wells and further to suggest an allowable loading mass of fertilizer for preservation of ground water quality under regulatory limits.

  2. Statistics-based model for prediction of chemical biosynthesis yield from Saccharomyces cerevisiae

    Directory of Open Access Journals (Sweden)

    Leonard Effendi

    2011-06-01

    Full Text Available Abstract Background The robustness of Saccharomyces cerevisiae in facilitating industrial-scale production of ethanol extends its utilization as a platform to synthesize other metabolites. Metabolic engineering strategies, typically via pathway overexpression and deletion, continue to play a key role for optimizing the conversion efficiency of substrates into the desired products. However, chemical production titer or yield remains difficult to predict based on reaction stoichiometry and mass balance. We sampled a large space of data of chemical production from S. cerevisiae, and developed a statistics-based model to calculate production yield using input variables that represent the number of enzymatic steps in the key biosynthetic pathway of interest, metabolic modifications, cultivation modes, nutrition and oxygen availability. Results Based on the production data of about 40 chemicals produced from S. cerevisiae, metabolic engineering methods, nutrient supplementation, and fermentation conditions described therein, we generated mathematical models with numerical and categorical variables to predict production yield. Statistically, the models showed that: 1. Chemical production from central metabolic precursors decreased exponentially with increasing number of enzymatic steps for biosynthesis (>30% loss of yield per enzymatic step, P-value = 0; 2. Categorical variables of gene overexpression and knockout improved product yield by 2~4 folds (P-value Saccharomyces cerevisiae has historically evolved for robust alcohol fermentation. Conclusions We generated simple mathematical models for first-order approximation of chemical production yield from S. cerevisiae. These linear models provide empirical insights to the effects of strain engineering and cultivation conditions toward biosynthetic efficiency. These models may not only provide guidelines for metabolic engineers to synthesize desired products, but also be useful to compare the

  3. Predictive data-derived Bayesian statistic-transport model and simulator of sunken oil mass

    Science.gov (United States)

    Echavarria Gregory, Maria Angelica

    Sunken oil is difficult to locate because remote sensing techniques cannot as yet provide views of sunken oil over large areas. Moreover, the oil may re-suspend and sink with changes in salinity, sediment load, and temperature, making deterministic fate models difficult to deploy and calibrate when even the presence of sunken oil is difficult to assess. For these reasons, together with the expense of field data collection, there is a need for a statistical technique integrating limited data collection with stochastic transport modeling. Predictive Bayesian modeling techniques have been developed and demonstrated for exploiting limited information for decision support in many other applications. These techniques brought to a multi-modal Lagrangian modeling framework, representing a near-real time approach to locating and tracking sunken oil driven by intrinsic physical properties of field data collected following a spill after oil has begun collecting on a relatively flat bay bottom. Methods include (1) development of the conceptual predictive Bayesian model and multi-modal Gaussian computational approach based on theory and literature review; (2) development of an object-oriented programming and combinatorial structure capable of managing data, integration and computation over an uncertain and highly dimensional parameter space; (3) creating a new bi-dimensional approach of the method of images to account for curved shoreline boundaries; (4) confirmation of model capability for locating sunken oil patches using available (partial) real field data and capability for temporal projections near curved boundaries using simulated field data; and (5) development of a stand-alone open-source computer application with graphical user interface capable of calibrating instantaneous oil spill scenarios, obtaining sets maps of relative probability profiles at different prediction times and user-selected geographic areas and resolution, and capable of performing post

  4. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    Science.gov (United States)

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  5. Energy Production Calculations with Field Flow Models and Windspeed Predictions with Statistical Methods

    Science.gov (United States)

    Rüstemoǧlu, Sevinç; Barutçu, Burak; Sibel Menteş, Å.ž.

    2010-05-01

    wind prediction. Statistical methods' preditictions as time series are included and the similartiy rates are compared for each method. The algorithms which are performed in MATLAB, gave the similarity results of each model. According to the Neural Networks results which are found to be the most successful method for prediction within these three statistical models, the windspeed similarity rate between the original measurements and the prediction set which includes 1 year period between 2003 and 2004, is evaluated as % 94.7. For wind direction, the similarity rate is %81.61. High noise margin and ability to learn the characteristics of the signal are important advantages of Neural Networks for compatible windspeed and direction predictions compared with measurements.

  6. Level density of the sd-nuclei-Statistical shell-model predictions

    Science.gov (United States)

    Karampagia, S.; Senkov, R. A.; Zelevinsky, V.

    2018-03-01

    Accurate knowledge of the nuclear level density is important both from a theoretical viewpoint as a powerful instrument for studying nuclear structure and for numerous applications. For example, astrophysical reactions responsible for the nucleosynthesis in the universe can be understood only if we know the nuclear level density. We use the configuration-interaction nuclear shell model to predict nuclear level density for all nuclei in the sd-shell, both total and for individual spins (only with positive parity). To avoid the diagonalization in large model spaces we use the moments method based on statistical properties of nuclear many-body systems. In the cases where the diagonalization is possible, the results of the moments method practically coincide with those from the shell-model calculations. Using the computed level densities, we fit the parameters of the Constant Temperature phenomenological model, which can be used by practitioners in their studies of nuclear reactions at excitation energies appropriate for the sd-shell nuclei.

  7. Extracting climate memory using Fractional Integrated Statistical Model: A new perspective on climate prediction

    Science.gov (United States)

    Yuan, Naiming; Fu, Zuntao; Liu, Shida

    2014-01-01

    Long term memory (LTM) in climate variability is studied by means of fractional integral techniques. By using a recently developed model, Fractional Integral Statistical Model (FISM), we in this report proposed a new method, with which one can estimate the long-lasting influences of historical climate states on the present time quantitatively, and further extract the influence as climate memory signals. To show the usability of this method, two examples, the Northern Hemisphere monthly Temperature Anomalies (NHTA) and the Pacific Decadal Oscillation index (PDO), are analyzed in this study. We find the climate memory signals indeed can be extracted and the whole variations can be further decomposed into two parts: the cumulative climate memory (CCM) and the weather-scale excitation (WSE). The stronger LTM is, the larger proportion the climate memory signals will account for in the whole variations. With the climate memory signals extracted, one can at least determine on what basis the considered time series will continue to change. Therefore, this report provides a new perspective on climate prediction. PMID:25300777

  8. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  9. A statistical model for predicting the retrieval rate of separated instruments and clinical decision-making

    Directory of Open Access Journals (Sweden)

    Chen Lin

    2015-12-01

    Conclusion: A statistical model relating to root canal curvature and depth of separated instruments was established to evaluate the retrieval rate of separated instruments, and the result of this formulation may provide clues for clinical decision-making.

  10. Predicting the fibre diameter of melt blown nonwovens: comparison of physical, statistical and artificial neural network models

    Science.gov (United States)

    Chen, Ting; Li, Liqing; Huang, Xiubao

    2005-06-01

    Physical, statistical and artificial neural network (ANN) models are established for predicting the fibre diameter of melt blown nonwovens from the processing parameters. The results show that the ANN model yields a very accurate prediction (average error of 0.013%), and a reasonably good ANN model can be achieved with relatively few data points. Because the physical model is based on the inherent physical principles of the phenomena of interest, it can yield reasonably good prediction results when experimental data are not available and the entire physical procedure is of interest. This area of research has great potential in the field of computer assisted design in melt blowing technology.

  11. Comparison of statistical and theoretical habitat models for conservation planning: the benefit of ensemble prediction

    Science.gov (United States)

    D. Todd Jones-Farrand; Todd M. Fearer; Wayne E. Thogmartin; Frank R. Thompson; Mark D. Nelson; John M. Tirpak

    2011-01-01

    Selection of a modeling approach is an important step in the conservation planning process, but little guidance is available. We compared two statistical and three theoretical habitat modeling approaches representing those currently being used for avian conservation planning at landscape and regional scales: hierarchical spatial count (HSC), classification and...

  12. Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.

    Science.gov (United States)

    Yu, Ying; Wang, Yirui; Gao, Shangce; Tang, Zheng

    2017-01-01

    With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

  13. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    International Nuclear Information System (INIS)

    Grotch, S.L.

    1991-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections. 11 refs.; 10 figs

  14. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    International Nuclear Information System (INIS)

    Grotch, S.L.

    1990-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections

  15. Predicting the lung compliance of mechanically ventilated patients via statistical modeling

    International Nuclear Information System (INIS)

    Ganzert, Steven; Kramer, Stefan; Guttmann, Josef

    2012-01-01

    To avoid ventilator associated lung injury (VALI) during mechanical ventilation, the ventilator is adjusted with reference to the volume distensibility or ‘compliance’ of the lung. For lung-protective ventilation, the lung should be inflated at its maximum compliance, i.e. when during inspiration a maximal intrapulmonary volume change is achieved by a minimal change of pressure. To accomplish this, one of the main parameters is the adjusted positive end-expiratory pressure (PEEP). As changing the ventilator settings usually produces an effect on patient's lung mechanics with a considerable time delay, the prediction of the compliance change associated with a planned change of PEEP could assist the physician at the bedside. This study introduces a machine learning approach to predict the nonlinear lung compliance for the individual patient by Gaussian processes, a probabilistic modeling technique. Experiments are based on time series data obtained from patients suffering from acute respiratory distress syndrome (ARDS). With a high hit ratio of up to 93%, the learned models could predict whether an increase/decrease of PEEP would lead to an increase/decrease of the compliance. However, the prediction of the complete pressure–volume relation for an individual patient has to be improved. We conclude that the approach is well suitable for the given problem domain but that an individualized feature selection should be applied for a precise prediction of individual pressure–volume curves. (paper)

  16. Modelling short- and long-term statistical learning of music as a process of predictive entropy reduction

    DEFF Research Database (Denmark)

    Hansen, Niels Christian; Loui, Psyche; Vuust, Peter

    Statistical learning underlies the generation of expectations with different degrees of uncertainty. In music, uncertainty applies to expectations for pitches in a melody. This uncertainty can be quantified by Shannon entropy from distributions of expectedness ratings for multiple continuations...... of each melody, as obtained with the probe-tone paradigm. We hypothesised that statistical learning of music can be modelled as a process of entropy reduction. Specifically, implicit learning of statistical regularities allows reduction in the relative entropy (i.e. symmetrised Kullback-Leibler Divergence...... of musical training, and within-participant decreases in entropy after short-term statistical learning of novel music. Thus, whereas inexperienced listeners make high-entropy predictions, following the Principle of Maximum Entropy, statistical learning over varying timescales enables listeners to generate...

  17. A new statistical scission-point model fed with microscopic ingredients to predict fission fragments distributions

    International Nuclear Information System (INIS)

    Heinrich, S.

    2006-01-01

    Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)

  18. Statistical Model for Content Extraction

    DEFF Research Database (Denmark)

    Qureshi, Pir Abdul Rasool; Memon, Nasrullah

    2011-01-01

    We present a statistical model for content extraction from HTML documents. The model operates on Document Object Model (DOM) tree of the corresponding HTML document. It evaluates each tree node and associated statistical features to predict significance of the node towards overall content of the ...... also describe the significance of the model in the domain of counterterrorism and open source intelligence....

  19. Drivers and seasonal predictability of extreme wind speeds in the ECMWF System 4 and a statistical model

    Science.gov (United States)

    Walz, M. A.; Donat, M.; Leckebusch, G. C.

    2017-12-01

    As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.

  20. Statistical surrogate models for prediction of high-consequence climate change.

    Energy Technology Data Exchange (ETDEWEB)

    Constantine, Paul; Field, Richard V., Jr.; Boslough, Mark Bruce Elrick

    2011-09-01

    In safety engineering, performance metrics are defined using probabilistic risk assessments focused on the low-probability, high-consequence tail of the distribution of possible events, as opposed to best estimates based on central tendencies. We frame the climate change problem and its associated risks in a similar manner. To properly explore the tails of the distribution requires extensive sampling, which is not possible with existing coupled atmospheric models due to the high computational cost of each simulation. We therefore propose the use of specialized statistical surrogate models (SSMs) for the purpose of exploring the probability law of various climate variables of interest. A SSM is different than a deterministic surrogate model in that it represents each climate variable of interest as a space/time random field. The SSM can be calibrated to available spatial and temporal data from existing climate databases, e.g., the Program for Climate Model Diagnosis and Intercomparison (PCMDI), or to a collection of outputs from a General Circulation Model (GCM), e.g., the Community Earth System Model (CESM) and its predecessors. Because of its reduced size and complexity, the realization of a large number of independent model outputs from a SSM becomes computationally straightforward, so that quantifying the risk associated with low-probability, high-consequence climate events becomes feasible. A Bayesian framework is developed to provide quantitative measures of confidence, via Bayesian credible intervals, in the use of the proposed approach to assess these risks.

  1. Geo-Based Statistical Models for Vulnerability Prediction of Highway Network Segments

    Directory of Open Access Journals (Sweden)

    Keren Pollak

    2014-04-01

    Full Text Available This study describes four statistical models—Poisson; Negative Binomial; Zero-Inflated Poisson; and Zero-Inflated Negative Binomial—which were devised in order to examine traffic accidents and estimate the best probability estimating model in terms of future risk assessment at interurban road sections. The study was conducted on four sets of fixed-length sections of the road network: 500, 750, 1000, and 1500 m. The contribution of transportation and spatial parameters as predictors of road accident rates was evaluated for all four data sets separately. In addition, the Empirical Bayes method was applied. This method uses historical accidents information, allowing regression to the mean phenomenon so as to improve model results. The study was performed using Geographic Information System (GIS software. Other analyses, such as statistical analyses combined with spatial parameters, interactions, and examination of other geographical areas, were also performed. The results showed that the short road sections data sets of 500 and 750 m yielded the most stable models. This allows focused treatment on short sections of the road network as a way to save resources (enforcement; education and information; finance and potentially gain maximum benefit at minimum investment. It was found that the significant parameters affecting accident rates are: curvature of the road section; the region and traffic volume. An interaction between the region and traffic volume was also found.

  2. Development and application of a statistical methodology to evaluate the predictive accuracy of building energy baseline models

    Energy Technology Data Exchange (ETDEWEB)

    Granderson, Jessica [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.; Price, Phillip N. [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.

    2014-03-01

    This paper documents the development and application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-­building energy savings. The methodology complements the principles addressed in resources such as ASHRAE Guideline 14 and the International Performance Measurement and Verification Protocol. It requires fitting a baseline model to data from a ``training period’’ and using the model to predict total electricity consumption during a subsequent ``prediction period.’’ We illustrate the methodology by evaluating five baseline models using data from 29 buildings. The training period and prediction period were varied, and model predictions of daily, weekly, and monthly energy consumption were compared to meter data to determine model accuracy. Several metrics were used to characterize the accuracy of the predictions, and in some cases the best-­performing model as judged by one metric was not the best performer when judged by another metric.

  3. Prediction of hydrate formation temperature by both statistical models and artificial neural network approaches

    International Nuclear Information System (INIS)

    Zahedi, Gholamreza; Karami, Zohre; Yaghoobi, Hamed

    2009-01-01

    In this study, various estimation methods have been reviewed for hydrate formation temperature (HFT) and two procedures have been presented. In the first method, two general correlations have been proposed for HFT. One of the correlations has 11 parameters, and the second one has 18 parameters. In order to obtain constants in proposed equations, 203 experimental data points have been collected from literatures. The Engineering Equation Solver (EES) and Statistical Package for the Social Sciences (SPSS) soft wares have been employed for statistical analysis of the data. Accuracy of the obtained correlations also has been declared by comparison with experimental data and some recent common used correlations. In the second method, HFT is estimated by artificial neural network (ANN) approach. In this case, various architectures have been checked using 70% of experimental data for training of ANN. Among the various architectures multi layer perceptron (MLP) network with trainlm training algorithm was found as the best architecture. Comparing the obtained ANN model results with 30% of unseen data confirms ANN excellent estimation performance. It was found that ANN is more accurate than traditional methods and even our two proposed correlations for HFT estimation.

  4. Vitamin D and ferritin correlation with chronic neck pain using standard statistics and a novel artificial neural network prediction model.

    Science.gov (United States)

    Eloqayli, Haytham; Al-Yousef, Ali; Jaradat, Raid

    2018-02-15

    Despite the high prevalence of chronic neck pain, there is limited consensus about the primary etiology, risk factors, diagnostic criteria and therapeutic outcome. Here, we aimed to determine if Ferritin and Vitamin D are modifiable risk factors with chronic neck pain using slandered statistics and artificial intelligence neural network (ANN). Fifty-four patients with chronic neck pain treated between February 2016 and August 2016 in King Abdullah University Hospital and 54 patients age matched controls undergoing outpatient or minor procedures were enrolled. Patients and control demographic parameters, height, weight and single measurement of serum vitamin D, Vitamin B12, ferritin, calcium, phosphorus, zinc were obtained. An ANN prediction model was developed. The statistical analysis reveals that patients with chronic neck pain have significantly lower serum Vitamin D and Ferritin (p-value Neural Network with Back Propagation(MFFNN) prediction model were developed and designed based on vitamin D and ferritin as input variables and CNP as output. The ANN model output results show that, 92 out of 108 samples were correctly classified with 85% classification accuracy. Although Iron and vitamin D deficiency cannot be isolated as the sole risk factors of chronic neck pain, they should be considered as two modifiable risk. The high prevalence of chronic neck pain, hypovitaminosis D and low ferritin amongst women is of concern. Bioinformatics predictions with artificial neural network can be of future benefit in classification and prediction models for chronic neck pain. We hope this initial work will encourage a future larger cohort study addressing vitamin D and iron correction as modifiable factors and the application of artificial intelligence models in clinical practice.

  5. Statistical Model for Predicting Weather Through Lunar Phase - Meteorological Phenomena Relationships in Makassar City

    OpenAIRE

    Hasanah, Nur

    2012-01-01

    ome scientist had made an evidence to proof the relationship between lunar phase and meteorological phenomena in their own area. Here we also did the assignment statistically in the case of Makassar City, Indonesia. We were using official meteorologist data, such as rainfall, clouds, and temperature, covering a period 28 years from January 1984 to December 2011. The statistical analyses were done using discriminant analysis and persistence. Further, we tested the result with cross validation ...

  6. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    Science.gov (United States)

    2017-09-01

    represented by the dispersion of the discrete forecast estimates (black curves). 6 The computational intractability of Epstein’s complete...that scales well with complicated systems, the posterior densities are often analytically intractable (G13). To this end, MCMC methods provide a...participation in this conflict . Nevertheless, the advantages of intuitive, common-sense Bayesian statistical conclusions detailed by Casella (2008), Gelman

  7. Development of a design space and predictive statistical model for capsule filling of low-fill-weight inhalation products.

    Science.gov (United States)

    Faulhammer, E; Llusa, M; Wahl, P R; Paudel, A; Lawrence, S; Biserni, S; Calzolari, V; Khinast, J G

    2016-01-01

    The objectives of this study were to develop a predictive statistical model for low-fill-weight capsule filling of inhalation products with dosator nozzles via the quality by design (QbD) approach and based on that to create refined models that include quadratic terms for significant parameters. Various controllable process parameters and uncontrolled material attributes of 12 powders were initially screened using a linear model with partial least square (PLS) regression to determine their effect on the critical quality attributes (CQA; fill weight and weight variability). After identifying critical material attributes (CMAs) and critical process parameters (CPPs) that influenced the CQA, model refinement was performed to study if interactions or quadratic terms influence the model. Based on the assessment of the effects of the CPPs and CMAs on fill weight and weight variability for low-fill-weight inhalation products, we developed an excellent linear predictive model for fill weight (R(2 )= 0.96, Q(2 )= 0.96 for powders with good flow properties and R(2 )= 0.94, Q(2 )= 0.93 for cohesive powders) and a model that provides a good approximation of the fill weight variability for each powder group. We validated the model, established a design space for the performance of different types of inhalation grade lactose on low-fill weight capsule filling and successfully used the CMAs and CPPs to predict fill weight of powders that were not included in the development set.

  8. Multilevel statistical models

    CERN Document Server

    Goldstein, Harvey

    2011-01-01

    This book provides a clear introduction to this important area of statistics. The author provides a wide of coverage of different kinds of multilevel models, and how to interpret different statistical methodologies and algorithms applied to such models. This 4th edition reflects the growth and interest in this area and is updated to include new chapters on multilevel models with mixed response types, smoothing and multilevel data, models with correlated random effects and modeling with variance.

  9. Statistical-Based Forecasting of Avalanche Prediction

    OpenAIRE

    K. Srinivasan; Girish Semwal; T. Sunil

    1999-01-01

    This paper describes the study carried out to predict few meteorological parameters of the nextday using the observed parameters of previous day through statistical methods. Multiple linear regression model was formulated for a hill station, Patsio, situated between Manali and Leh, for two winter months(December and January) separately. Twelve meteorological parameters were predicted using 18 predictorsob served on the previous day. Ten years data has been used for the computation of regressi...

  10. Report: Physics Constrained Stochastic Statistical Models for Extended Range Environmental Prediction

    Science.gov (United States)

    2013-09-30

    picture of ENSO-driven autoregressive models for North Pacific SST variability, providing evidence that intermittent processes, such as variability of...intermittent aspects (i) and (ii) are achieved by developing a simple stochastic parameterization for the unresolved details of synoptic -scale...stochastic parameterization of synoptic scale activity to build a stochastic skeleton model for the MJO; this is the first low order model of the MJO which

  11. Statistical modelling coupled with LC-MS analysis to predict human upper intestinal absorption of phytochemical mixtures.

    Science.gov (United States)

    Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise

    2018-04-15

    A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Physics Constrained Stochastic-Statistical Models for Extended Range Environmental Prediction

    Science.gov (United States)

    2014-09-30

    incorporated: multiple variables (wind, geopotential height, water vapor , and, as a proxy for convective activity, outgoing longwave radiation); multiple...a possible limitation in the water vapor formulation that will require further attention in the future. Finally, a simple interpretation is given...observational data and climate model data (Stechmann, Majda). 5. Identification of outgoing longwave radiation (OLR) satellite data as a measure of

  13. Sampling, Probability Models and Statistical Reasoning Statistical ...

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  14. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  15. Prediction of the pre-morbid 3D anatomy of the proximal humerus based on statistical shape modelling.

    Science.gov (United States)

    Poltaretskyi, S; Chaoui, J; Mayya, M; Hamitouche, C; Bercik, M J; Boileau, P; Walch, G

    2017-07-01

    Restoring the pre-morbid anatomy of the proximal humerus is a goal of anatomical shoulder arthroplasty, but reliance is placed on the surgeon's experience and on anatomical estimations. The purpose of this study was to present a novel method, 'Statistical Shape Modelling', which accurately predicts the pre-morbid proximal humeral anatomy and calculates the 3D geometric parameters needed to restore normal anatomy in patients with severe degenerative osteoarthritis or a fracture of the proximal humerus. From a database of 57 humeral CT scans 3D humeral reconstructions were manually created. The reconstructions were used to construct a statistical shape model (SSM), which was then tested on a second set of 52 scans. For each humerus in the second set, 3D reconstructions of four diaphyseal segments of varying lengths were created. These reconstructions were chosen to mimic severe osteoarthritis, a fracture of the surgical neck of the humerus and a proximal humeral fracture with diaphyseal extension. The SSM was then applied to the diaphyseal segments to see how well it predicted proximal morphology, using the actual proximal humeral morphology for comparison. With the metaphysis included, mimicking osteoarthritis, the errors of prediction for retroversion, inclination, height, radius of curvature and posterior and medial offset of the head of the humerus were 2.9° (± 2.3°), 4.0° (± 3.3°), 1.0 mm (± 0.8 mm), 0.8 mm (± 0.6 mm), 0.7 mm (± 0.5 mm) and 1.0 mm (± 0.7 mm), respectively. With the metaphysis excluded, mimicking a fracture of the surgical neck, the errors of prediction for retroversion, inclination, height, radius of curvature and posterior and medial offset of the head of the humerus were 3.8° (± 2.9°), 3.9° (± 3.4°), 2.4 mm (± 1.9 mm), 1.3 mm (± 0.9 mm), 0.8 mm (± 0.5 mm) and 0.9 mm (± 0.6 mm), respectively. This study reports a novel, computerised method that accurately predicts the pre-morbid proximal humeral anatomy even in challenging

  16. SIMPLIFIED PREDICTIVE MODELS FOR CO₂ SEQUESTRATION PERFORMANCE ASSESSMENT RESEARCH TOPICAL REPORT ON TASK #3 STATISTICAL LEARNING BASED MODELS

    Energy Technology Data Exchange (ETDEWEB)

    Mishra, Srikanta; Schuetter, Jared

    2014-11-01

    We compare two approaches for building a statistical proxy model (metamodel) for CO₂ geologic sequestration from the results of full-physics compositional simulations. The first approach involves a classical Box-Behnken or Augmented Pairs experimental design with a quadratic polynomial response surface. The second approach used a space-filling maxmin Latin Hypercube sampling or maximum entropy design with the choice of five different meta-modeling techniques: quadratic polynomial, kriging with constant and quadratic trend terms, multivariate adaptive regression spline (MARS) and additivity and variance stabilization (AVAS). Simulations results for CO₂ injection into a reservoir-caprock system with 9 design variables (and 97 samples) were used to generate the data for developing the proxy models. The fitted models were validated with using an independent data set and a cross-validation approach for three different performance metrics: total storage efficiency, CO₂ plume radius and average reservoir pressure. The Box-Behnken–quadratic polynomial metamodel performed the best, followed closely by the maximin LHS–kriging metamodel.

  17. Diffeomorphic Statistical Deformation Models

    DEFF Research Database (Denmark)

    Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus

    2007-01-01

    In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al....... The modifications ensure that no boundary restriction has to be enforced on the parameter space to prevent folds or tears in the deformation field. For straightforward statistical analysis, principal component analysis and sparse methods, we assume that the parameters for a class of deformations lie on a linear...

  18. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Hansen, Kurt Schaldemose; Larsen, Gunner Chr.

    2005-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... (PDF) of turbulence driven short-term extreme wind shear events, conditioned on the mean wind speed, for an arbitrary recurrence period. The model is based on an asymptotic expansion, and only a few and easily accessible parameters are needed as input. The model of the extreme PDF is supplemented...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...

  19. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Larsen, Gunner Chr.; Hansen, Kurt Schaldemose

    2004-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... (PDF) of turbulence driven short-term extreme wind shear events, conditioned on the mean wind speed, for an arbitrary recurrence period. The model is based on an asymptotic expansion, and only a few and easily accessible parameters are needed as input. The model of the extreme PDF is supplemented...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...

  20. Wind power prediction models

    Science.gov (United States)

    Levy, R.; Mcginness, H.

    1976-01-01

    Investigations were performed to predict the power available from the wind at the Goldstone, California, antenna site complex. The background for power prediction was derived from a statistical evaluation of available wind speed data records at this location and at nearby locations similarly situated within the Mojave desert. In addition to a model for power prediction over relatively long periods of time, an interim simulation model that produces sample wind speeds is described. The interim model furnishes uncorrelated sample speeds at hourly intervals that reproduce the statistical wind distribution at Goldstone. A stochastic simulation model to provide speed samples representative of both the statistical speed distributions and correlations is also discussed.

  1. Modelling short- and long-term statistical learning of music as a process of predictive entropy reduction

    DEFF Research Database (Denmark)

    Hansen, Niels Christian; Loui, Psyche; Vuust, Peter

    ) between listeners’ prior expectancy profiles and probability distributions of a musical style or of stimuli used in short-term experiments. Five previous probe-tone experiments with musicians and non-musicians were revisited. In Experiments 1-2 participants rated expectedness for tonal melodies......Statistical learning underlies the generation of expectations with different degrees of uncertainty. In music, uncertainty applies to expectations for pitches in a melody. This uncertainty can be quantified by Shannon entropy from distributions of expectedness ratings for multiple continuations...... of each melody, as obtained with the probe-tone paradigm. We hypothesised that statistical learning of music can be modelled as a process of entropy reduction. Specifically, implicit learning of statistical regularities allows reduction in the relative entropy (i.e. symmetrised Kullback-Leibler Divergence...

  2. A Robust Statistical Model to Predict the Future Value of the Milk Production of Dairy Cows Using Herd Recording Data

    DEFF Research Database (Denmark)

    Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose

    2017-01-01

    The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction...... of the future value of a dairy cow requires further detailed knowledge of the costs associated with feed, management practices, production systems, and disease. Here, we present a method to predict the future value of the milk production of a dairy cow based on herd recording data only. The method consists...... of several steps to evaluate lifetime milk production and individual cow somatic cell counts and to finally predict the average production for each day that the cow is alive. Herd recording data from 610 Danish Holstein herds were used to train and test a model predicting milk production (including factors...

  3. A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

    Science.gov (United States)

    Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

    2018-04-01

    In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.

  4. Predicting radiotherapy outcomes using statistical learning techniques

    Energy Technology Data Exchange (ETDEWEB)

    El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O [Washington University, Saint Louis, MO (United States); Lindsay, Patricia E; Hope, Andrew J [Department of Radiation Oncology, Princess Margaret Hospital, Toronto, ON (Canada)

    2009-09-21

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among

  5. Predicting radiotherapy outcomes using statistical learning techniques

    Science.gov (United States)

    El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.

    2009-09-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model

  6. Predicting radiotherapy outcomes using statistical learning techniques

    International Nuclear Information System (INIS)

    El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O; Lindsay, Patricia E; Hope, Andrew J

    2009-01-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model

  7. Crop Yield Predictions - High Resolution Statistical Model for Intra-season Forecasts Applied to Corn in the US

    Science.gov (United States)

    Cai, Y.

    2017-12-01

    Accurately forecasting crop yields has broad implications for economic trading, food production monitoring, and global food security. However, the variation of environmental variables presents challenges to model yields accurately, especially when the lack of highly accurate measurements creates difficulties in creating models that can succeed across space and time. In 2016, we developed a sequence of machine-learning based models forecasting end-of-season corn yields for the US at both the county and national levels. We combined machine learning algorithms in a hierarchical way, and used an understanding of physiological processes in temporal feature selection, to achieve high precision in our intra-season forecasts, including in very anomalous seasons. During the live run, we predicted the national corn yield within 1.40% of the final USDA number as early as August. In the backtesting of the 2000-2015 period, our model predicts national yield within 2.69% of the actual yield on average already by mid-August. At the county level, our model predicts 77% of the variation in final yield using data through the beginning of August and improves to 80% by the beginning of October, with the percentage of counties predicted within 10% of the average yield increasing from 68% to 73%. Further, the lowest errors are in the most significant producing regions, resulting in very high precision national-level forecasts. In addition, we identify the changes of important variables throughout the season, specifically early-season land surface temperature, and mid-season land surface temperature and vegetation index. For the 2017 season, we feed 2016 data to the training set, together with additional geospatial data sources, aiming to make the current model even more precise. We will show how our 2017 US corn yield forecasts converges in time, which factors affect the yield the most, as well as present our plans for 2018 model adjustments.

  8. Use of statistical models based on radiographic measurements to predict oviposition date and clutch size in rock iguanas (Cyclura nubila)

    International Nuclear Information System (INIS)

    Alberts, A.C.

    1995-01-01

    The ability to noninvasively estimate clutch size and predict oviposition date in reptiles can be useful not only to veterinary clinicians but also to managers of captive collections and field researchers. Measurements of egg size and shape, as well as position of the clutch within the coelomic cavity, were taken from diagnostic radiographs of 20 female Cuban rock iguanas, Cyclura nubila, 81 to 18 days prior to laying. Combined with data on maternal body size, these variables were entered into multiple regression models to predict clutch size and timing of egg laying. The model for clutch size was accurate to 0.53 ± 0.08 eggs, while the model for oviposition date was accurate to 6.22 ± 0.81 days. Equations were generated that should be applicable to this and other large Cyclura species. © 1995 Wiley-Liss, Inc

  9. Forecasting species distributions with geo-spatial data: R objects that predict from averages of competing statistical models or data mining methods

    Science.gov (United States)

    Salas, L. A.; Veloz, S.; Ballard, G.

    2011-12-01

    Most forecasting approaches based on statistical models and data mining methods share a set of characteristics: all are constructed from train sets and validated against test sets using methods to avoid over-fitting on the training data; standard validation methods are used (e.g., AUC values for binary response data); some form of model averaging is applied when predicting new values from a set of competing models; measurements of error of predictions and goodness-of-fit of each competing model are reported and made spatially explicit. Many packages exist in R to fit statistical models and for data mining, but few include algorithms for forecasting and there are no model-averaging methods. However, results from these packages are commonly reported in R objects (S4 classes) that usually extend from other objects, and so they share methods in common (e.g., "predict", "aic"). Here we illustrate an approach that takes advantages of the abovementioned commonalities to develop a "framework" using objects that fit competing models with algorithms for forecasting and include model averaging methods. These objects can be easily extended to incorporate new kinds of statistical and data mining methods. We illustrate this approach with three types of objects and show how to interact with them to produce weighted averages from competing models, and some tabular and graphic outputs. These objects have been compiled into an R package ("RavianForecasting" - http://data.prbo.org/apps/ravian). We encourage others to use and contribute toward the development of these types of forecasting objects, or to develop alternatives with similar flexibility. We show how these can be easily extended to incorporate new statistical methods, new outputs, new methods to weigh averages, and new methods to validate the models.

  10. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features

    Energy Technology Data Exchange (ETDEWEB)

    Grimm, Lars J., E-mail: Lars.grimm@duke.edu; Ghate, Sujata V.; Yoon, Sora C.; Kim, Connie [Department of Radiology, Duke University Medical Center, Box 3808, Durham, North Carolina 27710 (United States); Kuzmiak, Cherie M. [Department of Radiology, University of North Carolina School of Medicine, 2006 Old Clinic, CB No. 7510, Chapel Hill, North Carolina 27599 (United States); Mazurowski, Maciej A. [Duke University Medical Center, Box 2731 Medical Center, Durham, North Carolina 27710 (United States)

    2014-03-15

    Purpose: The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Methods: Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Results: Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502–0.739, 95% Confidence Interval: 0.543–0.680,p < 0.002). Conclusions: Patterns in detection errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees.

  11. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features.

    Science.gov (United States)

    Grimm, Lars J; Ghate, Sujata V; Yoon, Sora C; Kuzmiak, Cherie M; Kim, Connie; Mazurowski, Maciej A

    2014-03-01

    The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502-0.739, 95% Confidence Interval: 0.543-0.680,p errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees.

  12. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features

    International Nuclear Information System (INIS)

    Grimm, Lars J.; Ghate, Sujata V.; Yoon, Sora C.; Kim, Connie; Kuzmiak, Cherie M.; Mazurowski, Maciej A.

    2014-01-01

    Purpose: The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Methods: Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Results: Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502–0.739, 95% Confidence Interval: 0.543–0.680,p < 0.002). Conclusions: Patterns in detection errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees

  13. QSPR Models for Predicting Log Pliver Values for Volatile Organic Compounds Combining Statistical Methods and Domain Knowledge

    Directory of Open Access Journals (Sweden)

    Mónica F. Díaz

    2012-12-01

    Full Text Available Volatile organic compounds (VOCs are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log Pliver for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR models for the prediction of log Pliver, where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log Pliver models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.

  14. Sugar and acid content of Citrus prediction modeling using FT-IR fingerprinting in combination with multivariate statistical analysis.

    Science.gov (United States)

    Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung

    2016-01-01

    A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. A Robust Statistical Model to Predict the Future Value of the Milk Production of Dairy Cows Using Herd Recording Data

    DEFF Research Database (Denmark)

    Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose

    2017-01-01

    of the future value of a dairy cow requires further detailed knowledge of the costs associated with feed, management practices, production systems, and disease. Here, we present a method to predict the future value of the milk production of a dairy cow based on herd recording data only. The method consists......The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction...... of somatic cell count. We conclude that estimates of future average production can be used on a day-to-day basis to rank cows for culling, or can be implemented in simulation models of within-herd disease spread to make operational decisions, such as culling versus treatment. An advantage of the approach...

  16. Statistical aspects of radiogenomics: can radiogenomics models be used to aid prediction of outcomes in cancer patients?

    Science.gov (United States)

    Ren, Boya; Mazurowski, Maciej A.

    2017-03-01

    Radiogenomics is a new direction in cancer research that aims at identifying the relationship between tumor genomics and its appearance in imaging (i.e. its radiophenotype). Recent years brought multiple radiogenomic discoveries in brain, breast, lung, and other cancers. With development of this new field we believe that it important to investigate in which setting radiogenomics could be useful to better direct research effort. One of the general applications of radiogenomics is to generate imaging-based models for prediction of outcomes and doing so through modeling the relationship between imaging and genomics and the relationship between genomics and outcomes. We believe that this is an important potential application of radiogenomic as it could advance imaging-based precision medicine. We show a preliminary simulation study evaluation whether such approach results in improved models. We investigate different setting in terms of the strengths of the radiogenomic relationship, prognostic power of the imaging and genomic descriptors, and availability and quality of data. Our experiments indicated that the following parameters have impact on usefulness of the radiogenomic approach: predictive power of genomic features and imaging features, strength of the radiogenomic relationship as well as number and follow up time for the genomic data. Overall, we found that there are some situations in which radiogenomics approach is beneficial but only when the radiogenomic relationship is strong and low number of imaging cases with outcomes data are available.

  17. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    Science.gov (United States)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  18. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    Science.gov (United States)

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to

  19. Statistical basis for predicting technological progress.

    Directory of Open Access Journals (Sweden)

    Béla Nagy

    Full Text Available Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.

  20. Statistical Basis for Predicting Technological Progress

    Science.gov (United States)

    Nagy, Béla; Farmer, J. Doyne; Bui, Quan M.; Trancik, Jessika E.

    2013-01-01

    Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation. PMID:23468837

  1. Statistical basis for predicting technological progress.

    Science.gov (United States)

    Nagy, Béla; Farmer, J Doyne; Bui, Quan M; Trancik, Jessika E

    2013-01-01

    Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.

  2. Prediction of velocity distributions in rod bundle axial flow, with a statistical model (K-epsilon) of turbulence

    International Nuclear Information System (INIS)

    Silva Junior, H.C. da.

    1978-12-01

    Reactor fuel elements generally consist of rod bundles with the coolant flowing axially through the region between the rods. The confiability of the thermohydraulic design of such elements is related to a detailed description of the velocity field. A two-equation statistical model (K-epsilon) of turbulence is applied to compute main and secondary flow fields, wall shear stress distributions and friction factors of steady, fully developed turbulent flows, with incompressible, temperature independent fluid flowing axially through triangular or square arrays of rod bundles. The numerical procedure uses the vorticity and the stream function to describe the velocity field. Comparison with experimental and analytical data of several investigators is presented. Results are in good agreement. (Author) [pt

  3. The statistical evaluation and comparison of ADMS-Urban model for the prediction of nitrogen dioxide with air quality monitoring network.

    Science.gov (United States)

    Dėdelė, Audrius; Miškinytė, Auksė

    2015-09-01

    In many countries, road traffic is one of the main sources of air pollution associated with adverse effects on human health and environment. Nitrogen dioxide (NO2) is considered to be a measure of traffic-related air pollution, with concentrations tending to be higher near highways, along busy roads, and in the city centers, and the exceedances are mainly observed at measurement stations located close to traffic. In order to assess the air quality in the city and the air pollution impact on public health, air quality models are used. However, firstly, before the model can be used for these purposes, it is important to evaluate the accuracy of the dispersion modelling as one of the most widely used method. The monitoring and dispersion modelling are two components of air quality monitoring system (AQMS), in which statistical comparison was made in this research. The evaluation of the Atmospheric Dispersion Modelling System (ADMS-Urban) was made by comparing monthly modelled NO2 concentrations with the data of continuous air quality monitoring stations in Kaunas city. The statistical measures of model performance were calculated for annual and monthly concentrations of NO2 for each monitoring station site. The spatial analysis was made using geographic information systems (GIS). The calculation of statistical parameters indicated a good ADMS-Urban model performance for the prediction of NO2. The results of this study showed that the agreement of modelled values and observations was better for traffic monitoring stations compared to the background and residential stations.

  4. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    Energy Technology Data Exchange (ETDEWEB)

    Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

    2016-08-31

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.

  5. Predicting Statistical Distributions of Footbridge Vibrations

    DEFF Research Database (Denmark)

    Pedersen, Lars; Frier, Christian

    2009-01-01

    The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...

  6. Statistical Modelling of Wind Proles - Data Analysis and Modelling

    DEFF Research Database (Denmark)

    Jónsson, Tryggvi; Pinson, Pierre

    The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles.......The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles....

  7. A statistical model of the international spread of wild poliovirus in Africa used to predict and prevent outbreaks.

    Directory of Open Access Journals (Sweden)

    Kathleen M O'Reilly

    2011-10-01

    Full Text Available Outbreaks of poliomyelitis in African countries that were previously free of wild-type poliovirus cost the Global Polio Eradication Initiative US$850 million during 2003-2009, and have limited the ability of the program to focus on endemic countries. A quantitative understanding of the factors that predict the distribution and timing of outbreaks will enable their prevention and facilitate the completion of global eradication.Children with poliomyelitis in Africa from 1 January 2003 to 31 December 2010 were identified through routine surveillance of cases of acute flaccid paralysis, and separate outbreaks associated with importation of wild-type poliovirus were defined using the genetic relatedness of these viruses in the VP1/2A region. Potential explanatory variables were examined for their association with the number, size, and duration of poliomyelitis outbreaks in 6-mo periods using multivariable regression analysis. The predictive ability of 6-mo-ahead forecasts of poliomyelitis outbreaks in each country based on the regression model was assessed. A total of 142 genetically distinct outbreaks of poliomyelitis were recorded in 25 African countries, resulting in 1-228 cases (median of two cases. The estimated number of people arriving from infected countries and <5-y childhood mortality were independently associated with the number of outbreaks. Immunisation coverage based on the reported vaccination history of children with non-polio acute flaccid paralysis was associated with the duration and size of each outbreak, as well as the number of outbreaks. Six-month-ahead forecasts of the number of outbreaks in a country or region changed over time and had a predictive ability of 82%.Outbreaks of poliomyelitis resulted primarily from continued transmission in Nigeria and the poor immunisation status of populations in neighbouring countries. From 1 January 2010 to 30 June 2011, reduced transmission in Nigeria and increased incidence in reinfected

  8. Methods of statistical model estimation

    CERN Document Server

    Hilbe, Joseph

    2013-01-01

    Methods of Statistical Model Estimation examines the most important and popular methods used to estimate parameters for statistical models and provide informative model summary statistics. Designed for R users, the book is also ideal for anyone wanting to better understand the algorithms used for statistical model fitting. The text presents algorithms for the estimation of a variety of regression procedures using maximum likelihood estimation, iteratively reweighted least squares regression, the EM algorithm, and MCMC sampling. Fully developed, working R code is constructed for each method. Th

  9. Exclusion statistics and integrable models

    International Nuclear Information System (INIS)

    Mashkevich, S.

    1998-01-01

    The definition of exclusion statistics that was given by Haldane admits a 'statistical interaction' between distinguishable particles (multispecies statistics). For such statistics, thermodynamic quantities can be evaluated exactly; explicit expressions are presented here for cluster coefficients. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models of the Calogero-Sutherland type. The interesting questions of generalizing this correspondence to the higher-dimensional and the multispecies cases remain essentially open; however, our results provide some hints as to searches for the models in question

  10. Comparative study on the predictability of statistical models (RSM and ANN) on the behavior of optimized buccoadhesive wafers containing Loratadine and their in vivo assessment.

    Science.gov (United States)

    Chakraborty, Prithviraj; Parcha, Versha; Chakraborty, Debarupa D; Ghosh, Amitava

    2016-01-01

    Buccoadhesive wafer dosage form containing Loratadine is formulated utilizing Formulation by Design (FbD) approach incorporating sodium alginate and lactose monohydrate as independent variable employing solvent casting method. The wafers were statistically optimized using Response Surface Methodology (RSM) and Artificial Neural Network algorithm (ANN) for predicting physicochemical and physico-mechanical properties of the wafers as responses. Morphologically wafers were tested using SEM. Quick disintegration of the samples was examined employing Optical Contact Angle (OCA). The comparison of the predictability of RSM and ANN showed a high prognostic capacity of RSM model over ANN model in forecasting mechanical and physicochemical properties of the wafers. The in vivo assessment of the optimized buccoadhesive wafer exhibits marked increase in bioavailability justifying the administration of Loratadine through buccal route, bypassing hepatic first pass metabolism.

  11. [Statistical prediction methods in violence risk assessment and its application].

    Science.gov (United States)

    Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song

    2013-06-01

    It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.

  12. Sensometrics: Thurstonian and Statistical Models

    DEFF Research Database (Denmark)

    Christensen, Rune Haubo Bojesen

    This thesis is concerned with the development and bridging of Thurstonian and statistical models for sensory discrimination testing as applied in the scientific discipline of sensometrics. In sensory discrimination testing sensory differences between products are detected and quantified by the us...... of generalized linear mixed models, cumulative link models and cumulative link mixed models. The relation between the Wald, likelihood and score statistics is expanded upon using the shape of the (profile) likelihood function as common reference....

  13. Statistical model to predict dry sliding wear behaviour of Aluminium-Jute bast ash particulate composite produced by stir-casting

    Directory of Open Access Journals (Sweden)

    Gambo Anthony VICTOR

    2017-06-01

    Full Text Available A model to predict the dry sliding wear behaviour of Aluminium-Jute bast ash particulate composites produced by double stir-casting method was developed in terms of weight fraction of jute bast ash (JBA. Experiments were designed on the basis of the Design of Experiments (DOE technique. A 2k factorial, where k is the number of variables, with central composite second-order rotatable design was used to improve the reliability of results and to reduce the size of experimentation without loss of accuracy. The factors considered in this study were sliding velocity, sliding distance, normal load and mass fraction of JBA reinforcement in the matrix. The developed regression model was validated by statistical software MINITAB-R14 and statistical tool such as analysis of variance (ANOVA. It was found that the developed regression model could be effectively used to predict the wear rate at 95% confidence level. The wear rate of cast Al-JBAp composite decreased with an increase in the mass fraction of JBA and increased with an increase of the sliding velocity, sliding distance and normal load acting on the composite specimen.

  14. Statistical modeling for degradation data

    CERN Document Server

    Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru

    2017-01-01

    This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.

  15. Statistical modelling with quantile functions

    CERN Document Server

    Gilchrist, Warren

    2000-01-01

    Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...

  16. SU-F-BRB-10: A Statistical Voxel Based Normal Organ Dose Prediction Model for Coplanar and Non-Coplanar Prostate Radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Tran, A; Yu, V; Nguyen, D; Woods, K; Low, D; Sheng, K [UCLA, Los Angeles, CA (United States)

    2015-06-15

    Purpose: Knowledge learned from previous plans can be used to guide future treatment planning. Existing knowledge-based treatment planning methods study the correlation between organ geometry and dose volume histogram (DVH), which is a lossy representation of the complete dose distribution. A statistical voxel dose learning (SVDL) model was developed that includes the complete dose volume information. Its accuracy of predicting volumetric-modulated arc therapy (VMAT) and non-coplanar 4π radiotherapy was quantified. SVDL provided more isotropic dose gradients and may improve knowledge-based planning. Methods: 12 prostate SBRT patients originally treated using two full-arc VMAT techniques were re-planned with 4π using 20 intensity-modulated non-coplanar fields to a prescription dose of 40 Gy. The bladder and rectum voxels were binned based on their distances to the PTV. The dose distribution in each bin was resampled by convolving to a Gaussian kernel, resulting in 1000 data points in each bin that predicted the statistical dose information of a voxel with unknown dose in a new patient without triaging information that may be collectively important to a particular patient. We used this method to predict the DVHs, mean and max doses in a leave-one-out cross validation (LOOCV) test and compared its performance against lossy estimators including mean, median, mode, Poisson and Rayleigh of the voxelized dose distributions. Results: SVDL predicted the bladder and rectum doses more accurately than other estimators, giving mean percentile errors ranging from 13.35–19.46%, 4.81–19.47%, 22.49–28.69%, 23.35–30.5%, 21.05–53.93% for predicting mean, max dose, V20, V35, and V40 respectively, to OARs in both planning techniques. The prediction errors were generally lower for 4π than VMAT. Conclusion: By employing all dose volume information in the SVDL model, the OAR doses were more accurately predicted. 4π plans are better suited for knowledge-based planning than

  17. Statistical validation of stochastic models

    Energy Technology Data Exchange (ETDEWEB)

    Hunter, N.F. [Los Alamos National Lab., NM (United States). Engineering Science and Analysis Div.; Barney, P.; Paez, T.L. [Sandia National Labs., Albuquerque, NM (United States). Experimental Structural Dynamics Dept.; Ferregut, C.; Perez, L. [Univ. of Texas, El Paso, TX (United States). Dept. of Civil Engineering

    1996-12-31

    It is common practice in structural dynamics to develop mathematical models for system behavior, and the authors are now capable of developing stochastic models, i.e., models whose parameters are random variables. Such models have random characteristics that are meant to simulate the randomness in characteristics of experimentally observed systems. This paper suggests a formal statistical procedure for the validation of mathematical models of stochastic systems when data taken during operation of the stochastic system are available. The statistical characteristics of the experimental system are obtained using the bootstrap, a technique for the statistical analysis of non-Gaussian data. The authors propose a procedure to determine whether or not a mathematical model is an acceptable model of a stochastic system with regard to user-specified measures of system behavior. A numerical example is presented to demonstrate the application of the technique.

  18. The DGAV risk calculator: development and validation of statistical models for a web-based instrument predicting complications of colorectal cancer surgery.

    Science.gov (United States)

    Crispin, Alexander; Klinger, Carsten; Rieger, Anna; Strahwald, Brigitte; Lehmann, Kai; Buhr, Heinz-Johannes; Mansmann, Ulrich

    2017-10-01

    The purpose of this study is to provide a web-based calculator predicting complication probabilities of patients undergoing colorectal cancer (CRC) surgery in Germany. Analyses were based on records of first-time CRC surgery between 2010 and February 2017, documented in the database of the Study, Documentation, and Quality Center (StuDoQ) of the Deutsche Gesellschaft für Allgemein- und Viszeralchirurgie (DGAV), a registry of CRC surgery in hospitals throughout Germany, covering demography, medical history, tumor features, comorbidity, behavioral risk factors, surgical procedures, and outcomes. Using logistic ridge regression, separate models were developed in learning samples of 6729 colon and 4381 rectum cancer patients and evaluated in validation samples of sizes 2407 and 1287. Discrimination was assessed using c statistics. Calibration was examined graphically by plotting observed versus predicted complication probabilities and numerically using Brier scores. We report validation results regarding 15 outcomes such as any major complication, surgical site infection, anastomotic leakage, bladder voiding disturbance after rectal surgery, abdominal wall dehiscence, various internistic complications, 30-day readmission, 30-day reoperation rate, and 30-day mortality. When applied to the validation samples, c statistics ranged between 0.60 for anastomosis leakage and 0.85 for mortality after rectum cancer surgery. Brier scores ranged from 0.003 to 0.127. While most models showed satisfactory discrimination and calibration, this does not preclude overly optimistic or pessimistic individual predictions. To avoid misinterpretation, one has to understand the basic principles of risk calculation and risk communication. An e-learning tool outlining the appropriate use of the risk calculator is provided.

  19. Extreme events: dynamics, statistics and prediction

    Directory of Open Access Journals (Sweden)

    M. Ghil

    2011-05-01

    Full Text Available We review work on extreme events, their causes and consequences, by a group of European and American researchers involved in a three-year project on these topics. The review covers theoretical aspects of time series analysis and of extreme value theory, as well as of the deterministic modeling of extreme events, via continuous and discrete dynamic models. The applications include climatic, seismic and socio-economic events, along with their prediction.

    Two important results refer to (i the complementarity of spectral analysis of a time series in terms of the continuous and the discrete part of its power spectrum; and (ii the need for coupled modeling of natural and socio-economic systems. Both these results have implications for the study and prediction of natural hazards and their human impacts.

  20. Wind speed prediction using statistical regression and neural network

    Indian Academy of Sciences (India)

    Four different statistical techniques,viz.,curve fitting,Auto Regressive Integrated Moving Average Model (ARIMA),extrapolation with periodic function and Artificial Neural Networks (ANN)are employed to predict wind speed.These methods require wind speeds of previous hours as input.It has been found that wind speed can ...

  1. Modeling for influenza vaccines and adjuvants profile for safety prediction system using gene expression profiling and statistical tools

    Science.gov (United States)

    Sasaki, Eita; Momose, Haruka; Hiradate, Yuki; Furuhata, Keiko; Takai, Mamiko; Asanuma, Hideki; Ishii, Ken J.

    2018-01-01

    Historically, vaccine safety assessments have been conducted by animal testing (e.g., quality control tests and adjuvant development). However, classical evaluation methods do not provide sufficient information to make treatment decisions. We previously identified biomarker genes as novel safety markers. Here, we developed a practical safety assessment system used to evaluate the intramuscular, intraperitoneal, and nasal inoculation routes to provide robust and comprehensive safety data. Influenza vaccines were used as model vaccines. A toxicity reference vaccine (RE) and poly I:C-adjuvanted hemagglutinin split vaccine were used as toxicity controls, while a non-adjuvanted hemagglutinin split vaccine and AddaVax (squalene-based oil-in-water nano-emulsion with a formulation similar to MF59)-adjuvanted hemagglutinin split vaccine were used as safety controls. Body weight changes, number of white blood cells, and lung biomarker gene expression profiles were determined in mice. In addition, vaccines were inoculated into mice by three different administration routes. Logistic regression analyses were carried out to determine the expression changes of each biomarker. The results showed that the regression equations clearly classified each vaccine according to its toxic potential and inoculation amount by biomarker expression levels. Interestingly, lung biomarker expression was nearly equivalent for the various inoculation routes. The results of the present safety evaluation were confirmed by the approximation rate for the toxicity control. This method may contribute to toxicity evaluation such as quality control tests and adjuvant development. PMID:29408882

  2. Statistical Models for Social Networks

    NARCIS (Netherlands)

    Snijders, Tom A. B.; Cook, KS; Massey, DS

    2011-01-01

    Statistical models for social networks as dependent variables must represent the typical network dependencies between tie variables such as reciprocity, homophily, transitivity, etc. This review first treats models for single (cross-sectionally observed) networks and then for network dynamics. For

  3. Bootstrap prediction and Bayesian prediction under misspecified models

    OpenAIRE

    Fushiki, Tadayoshi

    2005-01-01

    We consider a statistical prediction problem under misspecified models. In a sense, Bayesian prediction is an optimal prediction method when an assumed model is true. Bootstrap prediction is obtained by applying Breiman's `bagging' method to a plug-in prediction. Bootstrap prediction can be considered to be an approximation to the Bayesian prediction under the assumption that the model is true. However, in applications, there are frequently deviations from the assumed model. In this paper, bo...

  4. Statistical analysis and ANN modeling for predicting hydrological extremes under climate change scenarios: the example of a small Mediterranean agro-watershed.

    Science.gov (United States)

    Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P

    2015-05-01

    The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. Part 2: Theoretical development of a dynamic model and application to rain fade durations and tolerable control delays for fade countermeasures

    Science.gov (United States)

    Manning, Robert M.

    1987-01-01

    A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.

  6. A Statistical Programme Assignment Model

    DEFF Research Database (Denmark)

    Rosholm, Michael; Staghøj, Jonas; Svarer, Michael

    When treatment effects of active labour market programmes are heterogeneous in an observable way  across the population, the allocation of the unemployed into different programmes becomes a particularly  important issue. In this paper, we present a statistical model designed to improve the present...

  7. Textual information access statistical models

    CERN Document Server

    Gaussier, Eric

    2013-01-01

    This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:- information extraction and retrieval;- text classification and clustering;- opinion mining;- comprehension aids (automatic summarization, machine translation, visualization).In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications

  8. Simple statistical model for branched aggregates

    DEFF Research Database (Denmark)

    Lemarchand, Claire; Hansen, Jesper Schmidt

    2015-01-01

    We propose a statistical model that can reproduce the size distribution of any branched aggregate, including amylopectin, dendrimers, molecular clusters of monoalcohols, and asphaltene nanoaggregates. It is based on the conditional probability for one molecule to form a new bond with a molecule......, given that it already has bonds with others. The model is applied here to asphaltene nanoaggregates observed in molecular dynamics simulations of Cooee bitumen. The variation with temperature of the probabilities deduced from this model is discussed in terms of statistical mechanics arguments....... The relevance of the statistical model in the case of asphaltene nanoaggregates is checked by comparing the predicted value of the probability for one molecule to have exactly i bonds with the same probability directly measured in the molecular dynamics simulations. The agreement is satisfactory...

  9. Improved model for statistical alignment

    Energy Technology Data Exchange (ETDEWEB)

    Miklos, I.; Toroczkai, Z. (Zoltan)

    2001-01-01

    The statistical approach to molecular sequence evolution involves the stochastic modeling of the substitution, insertion and deletion processes. Substitution has been modeled in a reliable way for more than three decades by using finite Markov-processes. Insertion and deletion, however, seem to be more difficult to model, and thc recent approaches cannot acceptably deal with multiple insertions and deletions. A new method based on a generating function approach is introduced to describe the multiple insertion process. The presented algorithm computes the approximate joint probability of two sequences in 0(13) running time where 1 is the geometric mean of the sequence lengths.

  10. Statistical modeling of geopressured geothermal reservoirs

    Science.gov (United States)

    Ansari, Esmail; Hughes, Richard; White, Christopher D.

    2017-06-01

    Identifying attractive candidate reservoirs for producing geothermal energy requires predictive models. In this work, inspectional analysis and statistical modeling are used to create simple predictive models for a line drive design. Inspectional analysis on the partial differential equations governing this design yields a minimum number of fifteen dimensionless groups required to describe the physics of the system. These dimensionless groups are explained and confirmed using models with similar dimensionless groups but different dimensional parameters. This study models dimensionless production temperature and thermal recovery factor as the responses of a numerical model. These responses are obtained by a Box-Behnken experimental design. An uncertainty plot is used to segment the dimensionless time and develop a model for each segment. The important dimensionless numbers for each segment of the dimensionless time are identified using the Boosting method. These selected numbers are used in the regression models. The developed models are reduced to have a minimum number of predictors and interactions. The reduced final models are then presented and assessed using testing runs. Finally, applications of these models are offered. The presented workflow is generic and can be used to translate the output of a numerical simulator into simple predictive models in other research areas involving numerical simulation.

  11. Confidence scores for prediction models

    DEFF Research Database (Denmark)

    Gerds, Thomas Alexander; van de Wiel, MA

    2011-01-01

    In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation......, then rival strategies can still be compared based on repeated bootstraps of the same data. Often, however, the overall performance of rival strategies is similar and it is thus difficult to decide for one model. Here, we investigate the variability of the prediction models that results when the same...... to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer...

  12. Prediction models in complex terrain

    DEFF Research Database (Denmark)

    Marti, I.; Nielsen, Torben Skov; Madsen, Henrik

    2001-01-01

    The objective of the work is to investigatethe performance of HIRLAM in complex terrain when used as input to energy production forecasting models, and to develop a statistical model to adapt HIRLAM prediction to the wind farm. The features of the terrain, specially the topography, influence...... the performance of HIRLAM in particular with respect to wind predictions. To estimate the performance of the model two spatial resolutions (0,5 Deg. and 0.2 Deg.) and different sets of HIRLAM variables were used to predict wind speed and energy production. The predictions of energy production for the wind farms...... are calculated using on-line measurements of power production as well as HIRLAM predictions as input thus taking advantage of the auto-correlation, which is present in the power production for shorter pediction horizons. Statistical models are used to discribe the relationship between observed energy production...

  13. Statistical lung model for microdosimetry

    International Nuclear Information System (INIS)

    Fisher, D.R.; Hadley, R.T.

    1984-03-01

    To calculate the microdosimetry of plutonium in the lung, a mathematical description is needed of lung tissue microstructure that defines source-site parameters. Beagle lungs were expanded using a glutaraldehyde fixative at 30 cm water pressure. Tissue specimens, five microns thick, were stained with hematoxylin and eosin then studied using an image analyzer. Measurements were made along horizontal lines through the magnified tissue image. The distribution of air space and tissue chord lengths and locations of epithelial cell nuclei were recorded from about 10,000 line scans. The distribution parameters constituted a model of lung microstructure for predicting the paths of random alpha particle tracks in the lung and the probability of traversing biologically sensitive sites. This lung model may be used in conjunction with established deposition and retention models for determining the microdosimetry in the pulmonary lung for a wide variety of inhaled radioactive materials

  14. Statistical models of petrol engines vehicles dynamics

    Science.gov (United States)

    Ilie, C. O.; Marinescu, M.; Alexa, O.; Vilău, R.; Grosu, D.

    2017-10-01

    This paper focuses on studying statistical models of vehicles dynamics. It was design and perform a one year testing program. There were used many same type cars with gasoline engines and different mileage. Experimental data were collected of onboard sensors and those on the engine test stand. A database containing data of 64th tests was created. Several mathematical modelling were developed using database and the system identification method. Each modelling is a SISO or a MISO linear predictive ARMAX (AutoRegressive-Moving-Average with eXogenous inputs) model. It represents a differential equation with constant coefficients. It were made 64th equations for each dependency like engine torque as output and engine’s load and intake manifold pressure, as inputs. There were obtained strings with 64 values for each type of model. The final models were obtained using average values of the coefficients. The accuracy of models was assessed.

  15. Statistical shape and appearance models of bones.

    Science.gov (United States)

    Sarkalkan, Nazli; Weinans, Harrie; Zadpoor, Amir A

    2014-03-01

    When applied to bones, statistical shape models (SSM) and statistical appearance models (SAM) respectively describe the mean shape and mean density distribution of bones within a certain population as well as the main modes of variations of shape and density distribution from their mean values. The availability of this quantitative information regarding the detailed anatomy of bones provides new opportunities for diagnosis, evaluation, and treatment of skeletal diseases. The potential of SSM and SAM has been recently recognized within the bone research community. For example, these models have been applied for studying the effects of bone shape on the etiology of osteoarthritis, improving the accuracy of clinical osteoporotic fracture prediction techniques, design of orthopedic implants, and surgery planning. This paper reviews the main concepts, methods, and applications of SSM and SAM as applied to bone. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    Science.gov (United States)

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be

  17. Statistical prediction of parametric roll using FORM

    DEFF Research Database (Denmark)

    Jensen, Jørgen Juncher; Choi, Ju-hyuck; Nielsen, Ulrik Dam

    2017-01-01

    Previous research has shown that the First Order Reliability Method (FORM) can be an efficient method for estimation of outcrossing rates and extreme value statistics for stationary stochastic processes. This is so also for bifurcation type of processes like parametric roll of ships. The present...

  18. Predicting recreational water quality advisories: A comparison of statistical methods

    Science.gov (United States)

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  19. Spatial statistics for predicting flow through a rock fracture

    International Nuclear Information System (INIS)

    Coakley, K.J.

    1989-03-01

    Fluid flow through a single rock fracture depends on the shape of the space between the upper and lower pieces of rock which define the fracture. In this thesis, the normalized flow through a fracture, i.e. the equivalent permeability of a fracture, is predicted in terms of spatial statistics computed from the arrangement of voids, i.e. open spaces, and contact areas within the fracture. Patterns of voids and contact areas, with complexity typical of experimental data, are simulated by clipping a correlated Gaussian process defined on a N by N pixel square region. The voids have constant aperture; the distance between the upper and lower surfaces which define the fracture is either zero or a constant. Local flow is assumed to be proportional to local aperture cubed times local pressure gradient. The flow through a pattern of voids and contact areas is solved using a finite-difference method. After solving for the flow through simulated 10 by 10 by 30 pixel patterns of voids and contact areas, a model to predict equivalent permeability is developed. The first model is for patterns with 80% voids where all voids have the same aperture. The equivalent permeability of a pattern is predicted in terms of spatial statistics computed from the arrangement of voids and contact areas within the pattern. Four spatial statistics are examined. The change point statistic measures how often adjacent pixel alternate from void to contact area (or vice versa ) in the rows of the patterns which are parallel to the overall flow direction. 37 refs., 66 figs., 41 tabs

  20. Time series prediction: statistical and neural techniques

    Science.gov (United States)

    Zahirniak, Daniel R.; DeSimio, Martin P.

    1996-03-01

    In this paper we compare the performance of nonlinear neural network techniques to those of linear filtering techniques in the prediction of time series. Specifically, we compare the results of using the nonlinear systems, known as multilayer perceptron and radial basis function neural networks, with the results obtained using the conventional linear Wiener filter, Kalman filter and Widrow-Hoff adaptive filter in predicting future values of stationary and non- stationary time series. Our results indicate the performance of each type of system is heavily dependent upon the form of the time series being predicted and the size of the system used. In particular, the linear filters perform adequately for linear or near linear processes while the nonlinear systems perform better for nonlinear processes. Since the linear systems take much less time to be developed, they should be tried prior to using the nonlinear systems when the linearity properties of the time series process are unknown.

  1. Statistical Prediction of Laminar-turbulent Transition

    National Research Council Canada - National Science Library

    Rubinstein, Robert

    2000-01-01

    ... on representative stability theories including the resonant triad model and the parabolized stability equations. The first type of model can describe the effect of initial phase differences among disturbance modes on transition location...

  2. Uncertainty propagation for statistical impact prediction of space debris

    Science.gov (United States)

    Hoogendoorn, R.; Mooij, E.; Geul, J.

    2018-01-01

    Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.

  3. Statistical Analysis by Statistical Physics Model for the STOCK Markets

    Science.gov (United States)

    Wang, Tiansong; Wang, Jun; Fan, Bingli

    A new stochastic stock price model of stock markets based on the contact process of the statistical physics systems is presented in this paper, where the contact model is a continuous time Markov process, one interpretation of this model is as a model for the spread of an infection. Through this model, the statistical properties of Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) are studied. In the present paper, the data of SSE Composite Index and the data of SZSE Component Index are analyzed, and the corresponding simulation is made by the computer computation. Further, we investigate the statistical properties, fat-tail phenomena, the power-law distributions, and the long memory of returns for these indices. The techniques of skewness-kurtosis test, Kolmogorov-Smirnov test, and R/S analysis are applied to study the fluctuation characters of the stock price returns.

  4. Graphics and statistics for cardiology: clinical prediction rules.

    Science.gov (United States)

    Woodward, Mark; Tunstall-Pedoe, Hugh; Peters, Sanne Ae

    2017-04-01

    Graphs and tables are indispensable aids to quantitative research. When developing a clinical prediction rule that is based on a cardiovascular risk score, there are many visual displays that can assist in developing the underlying statistical model, testing the assumptions made in this model, evaluating and presenting the resultant score. All too often, researchers in this field follow formulaic recipes without exploring the issues of model selection and data presentation in a meaningful and thoughtful way. Some ideas on how to use visual displays to make wise decisions and present results that will both inform and attract the reader are given. Ideas are developed, and results tested, using subsets of the data that were used to develop the ASSIGN cardiovascular risk score, as used in Scotland. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  5. Predicting weak lensing statistics from halo mass reconstructions - Final Paper

    Energy Technology Data Exchange (ETDEWEB)

    Everett, Spencer [SLAC National Accelerator Lab., Menlo Park, CA (United States)

    2015-08-20

    As dark matter does not absorb or emit light, its distribution in the universe must be inferred through indirect effects such as the gravitational lensing of distant galaxies. While most sources are only weakly lensed, the systematic alignment of background galaxies around a foreground lens can constrain the mass of the lens which is largely in the form of dark matter. In this paper, I have implemented a framework to reconstruct all of the mass along lines of sight using a best-case dark matter halo model in which the halo mass is known. This framework is then used to make predictions of the weak lensing of 3,240 generated source galaxies through a 324 arcmin² field of the Millennium Simulation. The lensed source ellipticities are characterized by the ellipticity-ellipticity and galaxy-mass correlation functions and compared to the same statistic for the intrinsic and ray-traced ellipticities. In the ellipticity-ellipticity correlation function, I and that the framework systematically under predicts the shear power by an average factor of 2.2 and fails to capture correlation from dark matter structure at scales larger than 1 arcminute. The model predicted galaxy-mass correlation function is in agreement with the ray-traced statistic from scales 0.2 to 0.7 arcminutes, but systematically underpredicts shear power at scales larger than 0.7 arcminutes by an average factor of 1.2. Optimization of the framework code has reduced the mean CPU time per lensing prediction by 70% to 24 ± 5 ms. Physical and computational shortcomings of the framework are discussed, as well as potential improvements for upcoming work.

  6. Reliable probabilities through statistical post-processing of ensemble predictions

    Science.gov (United States)

    Van Schaeybroeck, Bert; Vannitsem, Stéphane

    2013-04-01

    We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.

  7. Precipitation Prediction in North Africa Based on Statistical Downscaling

    Science.gov (United States)

    Molina, J. M.; Zaitchik, B.

    2013-12-01

    Although Global Climate Models (GCM) outputs should not be used directly to predict precipitation variability and change at the local scale, GCM projections of large-scale features in ocean and atmosphere can be applied to infer future statistical properties of climate at finer resolutions through empirical statistical downscaling techniques. A number of such downscaling methods have been proposed in the literature, and although all of them have advantages and limitations depending on the specific downscaling problem, most of them have been developed and tested in developed countries. In this research, we explore the use of statistical downscaling to generate future local precipitation scenarios in different locations in Northern Africa, where available data is sparse and missing values are frequently observed in the historical records. The presence of arid and semiarid regions in North African countries and the persistence of long periods with no rain pose challenges to the downscaling exercise since normality assumptions may be a serious limitation in the application of traditional linear regression methods. In our work, the development of monthly statistical relationships between the local precipitation and the large-scale predictors considers common Empirical Orthogonal Functions (EOFs) from different NCAR/Reanalysis climate fields (e.g., Sea Level Pressure (SLP) and Global Precipitation). GCM/CMIP5 data is considered in the predictor data set to analyze the future local precipitation. Both parametric (e.g., Generalized Linear Models (GLM)) and nonparametric (e,g,, Bootstrapping) approaches are considered in the regression analysis, and different spatial windows in the predictor fields are tested in the prediction experiments. In the latter, seasonal spatial cross-covariance between predictant and predictors is estimated by means of a teleconnections algorithm which was implemented to define the regions in the predictor domain that better captures the

  8. Statistical tests of simple earthquake cycle models

    Science.gov (United States)

    Devries, Phoebe M. R.; Evans, Eileen

    2016-01-01

    A central goal of observing and modeling the earthquake cycle is to forecast when a particular fault may generate an earthquake: a fault late in its earthquake cycle may be more likely to generate an earthquake than a fault early in its earthquake cycle. Models that can explain geodetic observations throughout the entire earthquake cycle may be required to gain a more complete understanding of relevant physics and phenomenology. Previous efforts to develop unified earthquake models for strike-slip faults have largely focused on explaining both preseismic and postseismic geodetic observations available across a few faults in California, Turkey, and Tibet. An alternative approach leverages the global distribution of geodetic and geologic slip rate estimates on strike-slip faults worldwide. Here we use the Kolmogorov-Smirnov test for similarity of distributions to infer, in a statistically rigorous manner, viscoelastic earthquake cycle models that are inconsistent with 15 sets of observations across major strike-slip faults. We reject a large subset of two-layer models incorporating Burgers rheologies at a significance level of α = 0.05 (those with long-term Maxwell viscosities ηM ~ 4.6 × 1020 Pa s) but cannot reject models on the basis of transient Kelvin viscosity ηK. Finally, we examine the implications of these results for the predicted earthquake cycle timing of the 15 faults considered and compare these predictions to the geologic and historical record.

  9. Cultural Resource Predictive Modeling

    Science.gov (United States)

    2017-10-01

    refining formal, inductive predictive models is the quality of the archaeological and environmental data. To build models efficiently, relevant...geomorphology, and historic information . Lessons Learned: The original model was focused on the identification of prehistoric resources. This...system but uses predictive modeling informally . For example, there is no probability for buried archaeological deposits on the Burton Mesa, but there is

  10. Quantifying the Influence of Scaling Metrics and Hydrogeological Data in the Statistical Characterization of Model Predictions in Well-Catchment Regions

    Science.gov (United States)

    de Barros, Felipe; Guadagnini, Alberto; Fernàndez-Garcia, Daniel; Riva, Monica; Sanchez-Vila, Xavier

    2013-04-01

    In this work, we evaluate the value of hydrogeological information on the assessment of the risk of contamination of a pumping well operating in a heterogeneous aquifer. Our aim is to statistically characterize the mass fraction of the contaminant recovered at the well and its corresponding arrival time. We do so by investigating the role of the key length scales that characterize and control the well region of influence and its probabilistic delineation with respect to the contaminant source location. The impact of augmenting hydrogeological data on the reduction of uncertainty associated with the environmental scenario is also analyzed. Results show that the way of obtaining a robust characterization of the target predictions depends on the length scale considered. For the sampling scheme considered in our simulations, the relevance of conditioning on the probability distributions of the solute mass fraction recovered at the well and the associated travel times is affected by the location of the contaminant source zone within the probabilistic well catchment. With respect to the statistical characterization of the travel time associated with the recovery of a given mass fraction, the worth of augmenting the hydrogeological data tends to diminish with decreasing solute residence time within the well catchment.

  11. Predictive modeling of complications.

    Science.gov (United States)

    Osorio, Joseph A; Scheer, Justin K; Ames, Christopher P

    2016-09-01

    Predictive analytic algorithms are designed to identify patterns in the data that allow for accurate predictions without the need for a hypothesis. Therefore, predictive modeling can provide detailed and patient-specific information that can be readily applied when discussing the risks of surgery with a patient. There are few studies using predictive modeling techniques in the adult spine surgery literature. These types of studies represent the beginning of the use of predictive analytics in spine surgery outcomes. We will discuss the advancements in the field of spine surgery with respect to predictive analytics, the controversies surrounding the technique, and the future directions.

  12. Using machine learning, neural networks and statistics to predict bankruptcy

    NARCIS (Netherlands)

    Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.

    1997-01-01

    Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear

  13. Statistical mechanics of helical wormlike chain model

    Science.gov (United States)

    Liu, Ya; Pérez, Toni; Li, Wei; Gunton, J. D.; Green, Amanda

    2011-02-01

    We investigate the statistical mechanics of polymers with bending and torsional elasticity described by the helical wormlike model. Noticing that the energy function is factorizable, we provide a numerical method to solve the model using a transfer matrix formulation. The tangent-tangent and binormal-binormal correlation functions have been calculated and displayed rich profiles which are sensitive to the combination of the temperature and the equilibrium torsion. Their behaviors indicate that there is no finite temperature Lifshitz point between the disordered and helical phases. The asymptotic behavior at low temperature has been investigated theoretically and the predictions fit the numerical results very well. Our analysis could be used to understand the statics of dsDNA and other chiral polymers.

  14. Statistical Mechanics of Helical Wormlike Model

    Science.gov (United States)

    Liu, Ya; Perez, Toni; Li, Wei; Gunton, James; Green, Amanda

    2011-03-01

    The bending and torsional elasticities are crucial in determining the static and dynamic properties of ~biopolymers such as dsDNA and sickle hemoglobin. We investigate the statistical mechanics of stiff polymers ~described by the helical wormlike model. We provide a numerical method to solve the model using a transfer matrix formulation. The correlation functions have been calculated and display rich profiles which are sensitive to the combination of the temperature and the equilibrium torsion. The asymptotic behavior at low temperature has been investigated theoretically and the predictions fit the numerical results very well. Our analysis could be used to understand the statics of dsDNA and other chiral polymers. This work is supported by grants from the NSF and Mathers Foundation.

  15. Statistical characterization of pitting corrosion process and life prediction

    International Nuclear Information System (INIS)

    Sheikh, A.K.; Younas, M.

    1995-01-01

    In order to prevent corrosion failures of machines and structures, it is desirable to know in advance when the corrosion damage will take place, and appropriate measures are needed to mitigate the damage. The corrosion predictions are needed both at development as well as operational stage of machines and structures. There are several forms of corrosion process through which varying degrees of damage can occur. Under certain conditions these corrosion processes at alone and in other set of conditions, several of these processes may occur simultaneously. For a certain type of machine elements and structures, such as gears, bearing, tubes, pipelines, containers, storage tanks etc., are particularly prone to pitting corrosion which is an insidious form of corrosion. The corrosion predictions are usually based on experimental results obtained from test coupons and/or field experiences of similar machines or parts of a structure. Considerable scatter is observed in corrosion processes. The probabilities nature and kinetics of pitting process makes in necessary to use statistical method to forecast the residual life of machine of structures. The focus of this paper is to characterization pitting as a time-dependent random process, and using this characterization the prediction of life to reach a critical level of pitting damage can be made. Using several data sets from literature on pitting corrosion, the extreme value modeling of pitting corrosion process, the evolution of the extreme value distribution in time, and their relationship to the reliability of machines and structure are explained. (author)

  16. Wind speed prediction using statistical regression and neural network

    Indian Academy of Sciences (India)

    Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve fitting,Auto Regressive ...

  17. Modelling bankruptcy prediction models in Slovak companies

    Directory of Open Access Journals (Sweden)

    Kovacova Maria

    2017-01-01

    Full Text Available An intensive research from academics and practitioners has been provided regarding models for bankruptcy prediction and credit risk management. In spite of numerous researches focusing on forecasting bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression and early artificial intelligence models (e.g. artificial neural networks, there is a trend for transition to machine learning models (support vector machines, bagging, boosting, and random forest to predict bankruptcy one year prior to the event. Comparing the performance of this with unconventional approach with results obtained by discriminant analysis, logistic regression, and neural networks application, it has been found that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. On the other side the prediction accuracy of old and well known bankruptcy prediction models is quiet high. Therefore, we aim to analyse these in some way old models on the dataset of Slovak companies to validate their prediction ability in specific conditions. Furthermore, these models will be modelled according to new trends by calculating the influence of elimination of selected variables on the overall prediction ability of these models.

  18. Surface drift prediction in the Adriatic Sea using hyper-ensemble statistics on atmospheric, ocean and wave models: Uncertainties and probability distribution areas

    Science.gov (United States)

    Rixen, M.; Ferreira-Coelho, E.; Signell, R.

    2008-01-01

    Despite numerous and regular improvements in underlying models, surface drift prediction in the ocean remains a challenging task because of our yet limited understanding of all processes involved. Hence, deterministic approaches to the problem are often limited by empirical assumptions on underlying physics. Multi-model hyper-ensemble forecasts, which exploit the power of an optimal local combination of available information including ocean, atmospheric and wave models, may show superior forecasting skills when compared to individual models because they allow for local correction and/or bias removal. In this work, we explore in greater detail the potential and limitations of the hyper-ensemble method in the Adriatic Sea, using a comprehensive surface drifter database. The performance of the hyper-ensembles and the individual models are discussed by analyzing associated uncertainties and probability distribution maps. Results suggest that the stochastic method may reduce position errors significantly for 12 to 72??h forecasts and hence compete with pure deterministic approaches. ?? 2007 NATO Undersea Research Centre (NURC).

  19. The accuracy of the SONOBREAST statistical model in comparison to BI-RADS for the prediction of malignancy in solid breast nodules detected at ultrasonography.

    Science.gov (United States)

    Paulinelli, Regis R; Oliveira, Luis-Fernando P; Freitas-Junior, Ruffo; Soares, Leonardo R

    2016-01-01

    The objective of the present study was to compare the accuracy of SONOBREAST for the prediction of malignancy in solid breast nodules detected at ultrasonography with that of the BI-RADS system and to assess the agreement between these two methods. This prospective study included 274 women and evaluated 500 breast nodules detected at ultrasonography. The probability of malignancy was calculated based on the SONOBREAST model, available at www.sonobreast.com.br, and on the BI-RADS system, with results being compared with the anatomopathology report. The lesions were considered suspect in 171 cases (34.20%), according to both SONOBREAST and BI-RADS. Agreement between the methods was perfect, as shown by a Kappa coefficient of 1 (pBI-RADS proved identical insofar as sensitivity (95.40%), specificity (78.69%), positive predictive value (48.54%), negative predictive value (98.78%) and accuracy (81.60%) are concerned. With respect to the categorical variables (BI-RADS categories 3, 4 and 5), the area under the receiver operating characteristic (ROC) curve was 94.41 for SONOBREAST (range 92.20-96.62) and 89.99 for BI-RADS (range 86.60-93.37). The accuracy of the SONOBREAST model is identical to that found with BI-RADS when the same parameters are used with respect to the cut-off point at which malignancy is suspected. Regarding the continuous probability of malignancy with BI-RADS categories 3, 4 and 5, SONOBREAST permits a more precise and individualized evaluation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  20. Statistical prediction of nanoparticle delivery: from culture media to cell.

    Science.gov (United States)

    Brown, M Rowan; Hondow, Nicole; Brydson, Rik; Rees, Paul; Brown, Andrew P; Summers, Huw D

    2015-04-17

    The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for

  1. Statistical modeling to support power system planning

    Science.gov (United States)

    Staid, Andrea

    This dissertation focuses on data-analytic approaches that improve our understanding of power system applications to promote better decision-making. It tackles issues of risk analysis, uncertainty management, resource estimation, and the impacts of climate change. Tools of data mining and statistical modeling are used to bring new insight to a variety of complex problems facing today's power system. The overarching goal of this research is to improve the understanding of the power system risk environment for improved operation, investment, and planning decisions. The first chapter introduces some challenges faced in planning for a sustainable power system. Chapter 2 analyzes the driving factors behind the disparity in wind energy investments among states with a goal of determining the impact that state-level policies have on incentivizing wind energy. Findings show that policy differences do not explain the disparities; physical and geographical factors are more important. Chapter 3 extends conventional wind forecasting to a risk-based focus of predicting maximum wind speeds, which are dangerous for offshore operations. Statistical models are presented that issue probabilistic predictions for the highest wind speed expected in a three-hour interval. These models achieve a high degree of accuracy and their use can improve safety and reliability in practice. Chapter 4 examines the challenges of wind power estimation for onshore wind farms. Several methods for wind power resource assessment are compared, and the weaknesses of the Jensen model are demonstrated. For two onshore farms, statistical models outperform other methods, even when very little information is known about the wind farm. Lastly, chapter 5 focuses on the power system more broadly in the context of the risks expected from tropical cyclones in a changing climate. Risks to U.S. power system infrastructure are simulated under different scenarios of tropical cyclone behavior that may result from climate

  2. Archaeological predictive model set.

    Science.gov (United States)

    2015-03-01

    This report is the documentation for Task 7 of the Statewide Archaeological Predictive Model Set. The goal of this project is to : develop a set of statewide predictive models to assist the planning of transportation projects. PennDOT is developing t...

  3. Atmospheric corrosion: statistical validation of models

    International Nuclear Information System (INIS)

    Diaz, V.; Martinez-Luaces, V.; Guineo-Cobs, G.

    2003-01-01

    In this paper we discuss two different methods for validation of regression models, applied to corrosion data. One of them is based on the correlation coefficient and the other one is the statistical test of lack of fit. Both methods are used here to analyse fitting of bi logarithmic model in order to predict corrosion for very low carbon steel substrates in rural and urban-industrial atmospheres in Uruguay. Results for parameters A and n of the bi logarithmic model are reported here. For this purpose, all repeated values were used instead of using average values as usual. Modelling is carried out using experimental data corresponding to steel substrates under the same initial meteorological conditions ( in fact, they are put in the rack at the same time). Results of correlation coefficient are compared with the lack of it tested at two different signification levels (α=0.01 and α=0.05). Unexpected differences between them are explained and finally, it is possible to conclude, at least in the studied atmospheres, that the bi logarithmic model does not fit properly the experimental data. (Author) 18 refs

  4. Statistical modelling of fish stocks

    DEFF Research Database (Denmark)

    Kvist, Trine

    1999-01-01

    for modelling the dynamics of a fish population is suggested. A new approach is introduced to analyse the sources of variation in age composition data, which is one of the most important sources of information in the cohort based models for estimation of stock abundancies and mortalities. The approach combines...... and it is argued that an approach utilising stochastic differential equations might be advantagous in fish stoch assessments....

  5. Quantifying predictive accuracy in survival models.

    Science.gov (United States)

    Lirette, Seth T; Aban, Inmaculada

    2017-12-01

    For time-to-event outcomes in medical research, survival models are the most appropriate to use. Unlike logistic regression models, quantifying the predictive accuracy of these models is not a trivial task. We present the classes of concordance (C) statistics and R 2 statistics often used to assess the predictive ability of these models. The discussion focuses on Harrell's C, Kent and O'Quigley's R 2 , and Royston and Sauerbrei's R 2 . We present similarities and differences between the statistics, discuss the software options from the most widely used statistical analysis packages, and give a practical example using the Worcester Heart Attack Study dataset.

  6. Zephyr - the prediction models

    DEFF Research Database (Denmark)

    Nielsen, Torben Skov; Madsen, Henrik; Nielsen, Henrik Aalborg

    2001-01-01

    utilities as partners and users. The new models are evaluated for five wind farms in Denmark as well as one wind farm in Spain. It is shown that the predictions based on conditional parametric models are superior to the predictions obatined by state-of-the-art parametric models.......This paper briefly describes new models and methods for predicationg the wind power output from wind farms. The system is being developed in a project which has the research organization Risø and the department of Informatics and Mathematical Modelling (IMM) as the modelling team and all the Danish...

  7. Fatigue crack initiation and growth life prediction with statistical consideration

    International Nuclear Information System (INIS)

    Kwon, J.D.; Choi, S.H.; Kwak, S.G.; Chun, K.O.

    1991-01-01

    Life prediction or residual life prediction of structures or machines is one of the most strongly world wide needed problems as requirement in the stage of slowly developing economy which comes after rapidly and highly developing stage. For the purpose of statistical life prediction, fatigue test was conducted under the 3 stress levels, and for each stress level, 20 specimens are used. The statistical properties of the crack growth parameter m and C in the fatigue crack growth law of da/dN = C(ΔK) m , and the relationship between m and C, and the statistical distribution pattern of fatigue crack initiation, growth and fracture lives can be obtained by experimental results

  8. Spherical Process Models for Global Spatial Statistics

    KAUST Repository

    Jeong, Jaehong

    2017-11-28

    Statistical models used in geophysical, environmental, and climate science applications must reflect the curvature of the spatial domain in global data. Over the past few decades, statisticians have developed covariance models that capture the spatial and temporal behavior of these global data sets. Though the geodesic distance is the most natural metric for measuring distance on the surface of a sphere, mathematical limitations have compelled statisticians to use the chordal distance to compute the covariance matrix in many applications instead, which may cause physically unrealistic distortions. Therefore, covariance functions directly defined on a sphere using the geodesic distance are needed. We discuss the issues that arise when dealing with spherical data sets on a global scale and provide references to recent literature. We review the current approaches to building process models on spheres, including the differential operator, the stochastic partial differential equation, the kernel convolution, and the deformation approaches. We illustrate realizations obtained from Gaussian processes with different covariance structures and the use of isotropic and nonstationary covariance models through deformations and geographical indicators for global surface temperature data. To assess the suitability of each method, we compare their log-likelihood values and prediction scores, and we end with a discussion of related research problems.

  9. Melanoma risk prediction models

    Directory of Open Access Journals (Sweden)

    Nikolić Jelena

    2014-01-01

    Full Text Available Background/Aim. The lack of effective therapy for advanced stages of melanoma emphasizes the importance of preventive measures and screenings of population at risk. Identifying individuals at high risk should allow targeted screenings and follow-up involving those who would benefit most. The aim of this study was to identify most significant factors for melanoma prediction in our population and to create prognostic models for identification and differentiation of individuals at risk. Methods. This case-control study included 697 participants (341 patients and 356 controls that underwent extensive interview and skin examination in order to check risk factors for melanoma. Pairwise univariate statistical comparison was used for the coarse selection of the most significant risk factors. These factors were fed into logistic regression (LR and alternating decision trees (ADT prognostic models that were assessed for their usefulness in identification of patients at risk to develop melanoma. Validation of the LR model was done by Hosmer and Lemeshow test, whereas the ADT was validated by 10-fold cross-validation. The achieved sensitivity, specificity, accuracy and AUC for both models were calculated. The melanoma risk score (MRS based on the outcome of the LR model was presented. Results. The LR model showed that the following risk factors were associated with melanoma: sunbeds (OR = 4.018; 95% CI 1.724- 9.366 for those that sometimes used sunbeds, solar damage of the skin (OR = 8.274; 95% CI 2.661-25.730 for those with severe solar damage, hair color (OR = 3.222; 95% CI 1.984-5.231 for light brown/blond hair, the number of common naevi (over 100 naevi had OR = 3.57; 95% CI 1.427-8.931, the number of dysplastic naevi (from 1 to 10 dysplastic naevi OR was 2.672; 95% CI 1.572-4.540; for more than 10 naevi OR was 6.487; 95%; CI 1.993-21.119, Fitzpatricks phototype and the presence of congenital naevi. Red hair, phototype I and large congenital naevi were

  10. Actuarial statistics with generalized linear mixed models

    NARCIS (Netherlands)

    Antonio, K.; Beirlant, J.

    2007-01-01

    Over the last decade the use of generalized linear models (GLMs) in actuarial statistics has received a lot of attention, starting from the actuarial illustrations in the standard text by McCullagh and Nelder [McCullagh, P., Nelder, J.A., 1989. Generalized linear models. In: Monographs on Statistics

  11. Learning predictive statistics from temporal sequences: Dynamics and strategies.

    Science.gov (United States)

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-10-01

    Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

  12. Statistical Modeling of Bivariate Data.

    Science.gov (United States)

    1982-08-01

    to one. Following Crain (1974), one may consider order m approximators m log f111(X) - k k (x) - c(e), asx ;b. (4.4.5) k,-r A m and attempt to find...literature. Consider the approximate model m log fn (x) = 7 ekk(x) + a G(x), aSx ;b, (44.8) " k=-Mn ’ where G(x) is a Gaussian process and n is a

  13. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    International Nuclear Information System (INIS)

    Yahya, Noorazrul; Ebert, Martin A.; Bulsara, Max; House, Michael J.; Kennedy, Angel; Joseph, David J.; Denham, James W.

    2016-01-01

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions

  14. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    Energy Technology Data Exchange (ETDEWEB)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au [School of Physics, University of Western Australia, Western Australia 6009, Australia and School of Health Sciences, National University of Malaysia, Bangi 43600 (Malaysia); Ebert, Martin A. [School of Physics, University of Western Australia, Western Australia 6009, Australia and Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Bulsara, Max [Institute for Health Research, University of Notre Dame, Fremantle, Western Australia 6959 (Australia); House, Michael J. [School of Physics, University of Western Australia, Western Australia 6009 (Australia); Kennedy, Angel [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Joseph, David J. [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008, Australia and School of Surgery, University of Western Australia, Western Australia 6009 (Australia); Denham, James W. [School of Medicine and Public Health, University of Newcastle, New South Wales 2308 (Australia)

    2016-05-15

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions

  15. Statistical Mining of Predictability of Seasonal Precipitation over the United States

    Science.gov (United States)

    Lau, William K. M.; Kim, Kyu-Myong; Shen, S. P.

    2001-01-01

    Results from a new ensemble canonical correlation (ECC) prediction model yield a remarkable (10-20%) increases in baseline prediction skills for seasonal precipitation over the US for all seasons, compared to traditional statistical predictions. While the tropical Pacific, i.e., El Nino, contributes to the largest share of potential predictability in the southern tier States during boreal winter, the North Pacific and the North Atlantic are responsible for enhanced predictability in the northern Great Plains, Midwest and the southwest US during boreal summer. Most importantly, ECC significantly reduces the spring predictability barrier over the conterminous US, thereby raising the skill bar for dynamical predictions.

  16. Statistical Models and Methods for Lifetime Data

    CERN Document Server

    Lawless, Jerald F

    2011-01-01

    Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,

  17. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 3: A stochastic rain fade control algorithm for satellite link power via non linear Markow filtering theory

    Science.gov (United States)

    Manning, Robert M.

    1991-01-01

    The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.

  18. Inverse and Predictive Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Syracuse, Ellen Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-09-27

    The LANL Seismo-Acoustic team has a strong capability in developing data-driven models that accurately predict a variety of observations. These models range from the simple – one-dimensional models that are constrained by a single dataset and can be used for quick and efficient predictions – to the complex – multidimensional models that are constrained by several types of data and result in more accurate predictions. Team members typically build models of geophysical characteristics of Earth and source distributions at scales of 1 to 1000s of km, the techniques used are applicable for other types of physical characteristics at an even greater range of scales. The following cases provide a snapshot of some of the modeling work done by the Seismo- Acoustic team at LANL.

  19. Accelerated life models modeling and statistical analysis

    CERN Document Server

    Bagdonavicius, Vilijandas

    2001-01-01

    Failure Time DistributionsIntroductionParametric Classes of Failure Time DistributionsAccelerated Life ModelsIntroductionGeneralized Sedyakin's ModelAccelerated Failure Time ModelProportional Hazards ModelGeneralized Proportional Hazards ModelsGeneralized Additive and Additive-Multiplicative Hazards ModelsChanging Shape and Scale ModelsGeneralizationsModels Including Switch-Up and Cycling EffectsHeredity HypothesisSummaryAccelerated Degradation ModelsIntroductionDegradation ModelsModeling the Influence of Explanatory Varia

  20. Individual Differences in Statistical Learning Predict Children's Comprehension of Syntax

    Science.gov (United States)

    Kidd, Evan; Arciuli, Joanne

    2016-01-01

    Variability in children's language acquisition is likely due to a number of cognitive and social variables. The current study investigated whether individual differences in statistical learning (SL), which has been implicated in language acquisition, independently predicted 6- to 8-year-old's comprehension of syntax. Sixty-eight (N = 68)…

  1. Learning predictive statistics from temporal sequences: Dynamics and strategies

    Science.gov (United States)

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E.; Kourtzi, Zoe

    2017-01-01

    Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics—that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments. PMID:28973111

  2. Bayesian models: A statistical primer for ecologists

    Science.gov (United States)

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  3. Uncertainty the soul of modeling, probability & statistics

    CERN Document Server

    Briggs, William

    2016-01-01

    This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...

  4. Automated statistical modeling of analytical measurement systems

    International Nuclear Information System (INIS)

    Jacobson, J.J.

    1992-01-01

    The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability

  5. A statistical study of the performance of the Hakamada-Akasofu-Fry version 2 numerical model in predicting solar shock arrival times at Earth during different phases of solar cycle 23

    Directory of Open Access Journals (Sweden)

    S. M. P. McKenna-Lawlor

    2012-02-01

    Full Text Available The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2 numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50%. This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50%, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter "Probability of Detection, yes" (PODy which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed, yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The statistical

  6. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    Science.gov (United States)

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  7. Statistical modelling of citation exchange between statistics journals.

    Science.gov (United States)

    Varin, Cristiano; Cattelan, Manuela; Firth, David

    2016-01-01

    Rankings of scholarly journals based on citation data are often met with scepticism by the scientific community. Part of the scepticism is due to disparity between the common perception of journals' prestige and their ranking based on citation counts. A more serious concern is the inappropriate use of journal rankings to evaluate the scientific influence of researchers. The paper focuses on analysis of the table of cross-citations among a selection of statistics journals. Data are collected from the Web of Science database published by Thomson Reuters. Our results suggest that modelling the exchange of citations between journals is useful to highlight the most prestigious journals, but also that journal citation data are characterized by considerable heterogeneity, which needs to be properly summarized. Inferential conclusions require care to avoid potential overinterpretation of insignificant differences between journal ratings. Comparison with published ratings of institutions from the UK's research assessment exercise shows strong correlation at aggregate level between assessed research quality and journal citation 'export scores' within the discipline of statistics.

  8. Topology for statistical modeling of petascale data.

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)

    2011-07-01

    This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.

  9. Melanoma Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing melanoma cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  10. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    Science.gov (United States)

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix

  11. Infinite Random Graphs as Statistical Mechanical Models

    DEFF Research Database (Denmark)

    Durhuus, Bergfinnur Jøgvan; Napolitano, George Maria

    2011-01-01

    We discuss two examples of infinite random graphs obtained as limits of finite statistical mechanical systems: a model of two-dimensional dis-cretized quantum gravity defined in terms of causal triangulated surfaces, and the Ising model on generic random trees. For the former model we describe...

  12. Mixed deterministic statistical modelling of regional ozone air pollution

    KAUST Repository

    Kalenderski, Stoitchko

    2011-03-17

    We develop a physically motivated statistical model for regional ozone air pollution by separating the ground-level pollutant concentration field into three components, namely: transport, local production and large-scale mean trend mostly dominated by emission rates. The model is novel in the field of environmental spatial statistics in that it is a combined deterministic-statistical model, which gives a new perspective to the modelling of air pollution. The model is presented in a Bayesian hierarchical formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate that the model vastly outperforms existing, simpler modelling approaches. Our study highlights the importance of simultaneously considering different aspects of an air pollution problem as well as taking into account the physical bases that govern the processes of interest. © 2011 John Wiley & Sons, Ltd..

  13. Review of statistical models for nuclear reactions

    International Nuclear Information System (INIS)

    Igarasi, Sin-iti

    1991-01-01

    Statistical model calculations have been widely performed for nuclear data evaluations. These were based on the models of Hauser-Feshbach, Weisskopf-Ewing and their modifications. Since the 1940s, non-compound nuclear phenomena have been observed, and stimulated many nuclear physicists to study compound and non-compound nuclear reaction mechanisms. Concerning compound nuclear reactions, they investigated problems on the basis of fundamental properties of S-matrix, statistical distributions of resonance pole parameters, random matrix elements of the nuclear Hamiltonian, and so forth. They have presented many sophisticated results. But old statistical models have been still useful, because these models were simple and easily utilizable. In this report, these old and new models will be briefly reviewed with a purpose of application to nuclear data evaluation, and examine applicability of the new models. (author)

  14. On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology.

    Directory of Open Access Journals (Sweden)

    Paul B Conn

    Full Text Available Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook's notion of an independent variable hull (IVH, developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models.

  15. Matrix Tricks for Linear Statistical Models

    CERN Document Server

    Puntanen, Simo; Styan, George PH

    2011-01-01

    In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and

  16. Daily precipitation statistics in regional climate models

    DEFF Research Database (Denmark)

    Frei, Christoph; Christensen, Jens Hesselbjerg; Déqué, Michel

    2003-01-01

    . The 15-year integrations were forced from reanalyses and observed sea surface temperature and sea ice (global model from sea surface only). The observational reference is based on 6400 rain gauge records (10-50 stations per grid box). Evaluation statistics encompass mean precipitation, wet-day frequency...... for other statistics. In summer, all models underestimate precipitation intensity (by 16-42%) and there is a too low frequency of heavy events. This bias reflects too dry summer mean conditions in three of the models, while it is partly compensated by too many low-intensity events in the other two models...

  17. Distributions with given marginals and statistical modelling

    CERN Document Server

    Fortiana, Josep; Rodriguez-Lallena, José

    2002-01-01

    This book contains a selection of the papers presented at the meeting `Distributions with given marginals and statistical modelling', held in Barcelona (Spain), July 17-20, 2000. In 24 chapters, this book covers topics such as the theory of copulas and quasi-copulas, the theory and compatibility of distributions, models for survival distributions and other well-known distributions, time series, categorical models, definition and estimation of measures of dependence, monotonicity and stochastic ordering, shape and separability of distributions, hidden truncation models, diagonal families, orthogonal expansions, tests of independence, and goodness of fit assessment. These topics share the use and properties of distributions with given marginals, this being the fourth specialised text on this theme. The innovative aspect of the book is the inclusion of statistical aspects such as modelling, Bayesian statistics, estimation, and tests.

  18. Bilingual Cluster Based Models for Statistical Machine Translation

    Science.gov (United States)

    Yamamoto, Hirofumi; Sumita, Eiichiro

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  19. A formal statistical approach to representing uncertainty in rainfall-runoff modelling with focus on residual analysis and probabilistic output evaluation - Distinguishing simulation and prediction

    DEFF Research Database (Denmark)

    Breinholt, Anders; Møller, Jan Kloppenborg; Madsen, Henrik

    2012-01-01

    evaluation of the modelled output, and we attach particular importance to inspecting the residuals of the model outputs and improving the model uncertainty description. We also introduce the probabilistic performance measures sharpness, reliability and interval skill score for model comparison...... and for checking the reliability of the confidence bounds. Using point rainfall and evaporation data as input and flow measurements from a sewer system for model conditioning, a state space model is formulated that accounts for three different flow contributions: wastewater from households, and fast rainfall......-runoff from paved areas and slow rainfall-dependent infiltration-inflow from unknown sources. We consider two different approaches to evaluate the model output uncertainty, the output error method that lumps all uncertainty into the observation noise term, and a method based on Stochastic Differential...

  20. Statistical Modeling for Radiation Hardness Assurance

    Science.gov (United States)

    Ladbury, Raymond L.

    2014-01-01

    We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.

  1. Performance modeling, loss networks, and statistical multiplexing

    CERN Document Server

    Mazumdar, Ravi

    2009-01-01

    This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I

  2. Statistical Model Checking for Stochastic Hybrid Systems

    DEFF Research Database (Denmark)

    David, Alexandre; Du, Dehui; Larsen, Kim Guldstrand

    2012-01-01

    This paper presents novel extensions and applications of the UPPAAL-SMC model checker. The extensions allow for statistical model checking of stochastic hybrid systems. We show how our race-based stochastic semantics extends to networks of hybrid systems, and indicate the integration technique ap...

  3. Statistical Post Processes for the Improvement of the Results of Numerical Wave Prediction Models. A Combination of Kolmogorov-Zurbenko and Kalman Filters (PREPRINT)

    Science.gov (United States)

    2010-01-01

    activities such as ship traffic,  tourism , offshore exploration, etc.  The  most reliable  tools  today  towards such  forecasts are  the numerical wave...G. Kallos, and I. Pytharoulis, Applications of Kalman filters  based on non‐linear functions to numerical weather predictions,  Annales  Geophysicae, 24

  4. Advances in statistical models for data analysis

    CERN Document Server

    Minerva, Tommaso; Vichi, Maurizio

    2015-01-01

    This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.

  5. Hierarchical modelling for the environmental sciences statistical methods and applications

    CERN Document Server

    Clark, James S

    2006-01-01

    New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.

  6. Statistical Modeling of Energy Production by Photovoltaic Farms

    Czech Academy of Sciences Publication Activity Database

    Brabec, Marek; Pelikán, Emil; Krč, Pavel; Eben, Kryštof; Musílek, P.

    2011-01-01

    Roč. 5, č. 9 (2011), s. 785-793 ISSN 1934-8975 Grant - others:GA AV ČR(CZ) M100300904 Institutional research plan: CEZ:AV0Z10300504 Keywords : electrical energy * solar energy * numerical weather prediction model * nonparametric regression * beta regression Subject RIV: BB - Applied Statistics, Operational Research

  7. Statistical tests for equal predictive ability across multiple forecasting methods

    DEFF Research Database (Denmark)

    Borup, Daniel; Thyrsgaard, Martin

    We develop a multivariate generalization of the Giacomini-White tests for equal conditional predictive ability. The tests are applicable to a mixture of nested and non-nested models, incorporate estimation uncertainty explicitly, and allow for misspecification of the forecasting model as well as ...

  8. Can spatial statistical river temperature models be transferred between catchments?

    Science.gov (United States)

    Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.

    2017-09-01

    There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across

  9. Statistical physics of pairwise probability models

    DEFF Research Database (Denmark)

    Roudi, Yasser; Aurell, Erik; Hertz, John

    2009-01-01

    (dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of  data......: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying...... and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring...

  10. Assessing risk factors for dental caries: a statistical modeling approach.

    Science.gov (United States)

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  11. Growth curve models and statistical diagnostics

    CERN Document Server

    Pan, Jian-Xin

    2002-01-01

    Growth-curve models are generalized multivariate analysis-of-variance models. These models are especially useful for investigating growth problems on short times in economics, biology, medical research, and epidemiology. This book systematically introduces the theory of the GCM with particular emphasis on their multivariate statistical diagnostics, which are based mainly on recent developments made by the authors and their collaborators. The authors provide complete proofs of theorems as well as practical data sets and MATLAB code.

  12. Topology for Statistical Modeling of Petascale Data

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)

    2013-10-31

    Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.

  13. An R companion to linear statistical models

    CERN Document Server

    Hay-Jahans, Christopher

    2011-01-01

    Focusing on user-developed programming, An R Companion to Linear Statistical Models serves two audiences: those who are familiar with the theory and applications of linear statistical models and wish to learn or enhance their skills in R; and those who are enrolled in an R-based course on regression and analysis of variance. For those who have never used R, the book begins with a self-contained introduction to R that lays the foundation for later chapters.This book includes extensive and carefully explained examples of how to write programs using the R programming language. These examples cove

  14. Bayesian models a statistical primer for ecologists

    CERN Document Server

    Hobbs, N Thompson

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probabili

  15. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    Science.gov (United States)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  16. Probably not future prediction using probability and statistical inference

    CERN Document Server

    Dworsky, Lawrence N

    2008-01-01

    An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...

  17. Applications of spatial statistical network models to stream data

    Science.gov (United States)

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  18. STATISTICAL MODELS OF REPRESENTING INTELLECTUAL CAPITAL

    Directory of Open Access Journals (Sweden)

    Andreea Feraru

    2016-06-01

    Full Text Available This article entitled Statistical Models of Representing Intellectual Capital approaches and analyses the concept of intellectual capital, as well as the main models which can support enterprisers/managers in evaluating and quantifying the advantages of intellectual capital. Most authors examine intellectual capital from a static perspective and focus on the development of its various evaluation models. In this chapter we surveyed the classical static models: Sveiby, Edvisson, Balanced Scorecard, as well as the canonical model of intellectual capital. Among the group of static models for evaluating organisational intellectual capital the canonical model stands out. This model enables the structuring of organisational intellectual capital in: human capital, structural capital and relational capital. Although the model is widely spread, it is a static one and can thus create a series of errors in the process of evaluation, because all the three entities mentioned above are not independent from the viewpoint of their contents, as any logic of structuring complex entities requires.

  19. Effects of dietary phenolics and botanical extracts on hepatotoxicity-related endpoints in human and rat hepatoma cells and statistical models for prediction of hepatotoxicity.

    Science.gov (United States)

    Liu, Yitong; Flynn, Thomas J; Ferguson, Martine S; Hoagland, Erica M; Yu, Liangli Lucy

    2011-08-01

    Toxicity assessment of botanical materials is difficult because they are typically complex mixtures of phytochemicals. In the present study, 16 phenolics were tested in both human (HepG2/C3A) and rat (MH1C1) hepatoma cells using a battery of eight toxicity endpoints. Cluster analysis was used to group the phenolics into four clusters for each cell type. Comparison of overall and individual liver activity of phenolics on both human and rat hepatoma cell lines showed significant differences for some endpoints. However, the cluster membership was similar across both cell types with the majority of phenolics clustering with the solvent control group (cluster 1). Each cell type produced a cluster of compounds with reported in vivo liver toxicity (cluster 2). Five herbal extracts were prepared and then tested as above. Using the cluster model developed with the phenolics, in the HepG2/C3A cells green tea was assigned to cluster 2 and the remaining four extracts to cluster 1. In the MH1C1 cells, green tea and thyme were assigned to cluster 2, cinnamon to cluster 4, and juniper berry and peppermint to cluster 1. The data suggest that this in vitro model may be useful for identifying hepatotoxic phenolics and botanical preparations rich in phenolics. Published by Elsevier Ltd.

  20. Understanding and forecasting polar stratospheric variability with statistical models

    Directory of Open Access Journals (Sweden)

    C. Blume

    2012-07-01

    Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely the multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. The period from 2005 to 2011 can be hindcasted to a certain extent, where MLP performs significantly better than the remaining models. However, variability remains that cannot be statistically hindcasted within the current framework, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a winter 2011/12 with warm and weak vortex conditions. A vortex breakdown is predicted for late January, early February 2012.

  1. A Headway to QoS on Traffic Prediction over VANETs using RRSCM Statistical Classifier

    Directory of Open Access Journals (Sweden)

    ISHTIAQUE MAHMOOD

    2016-07-01

    Full Text Available In this paper, a novel throughput measurement forecast model is recommended for VANETs. The model is based on a statistical technique adopted and deployed over a high speed IP network traffic. Network traffic would always experience more QoS (Quality of Service issues such as jitter, delay, packet loss and degradation due to very low bit rate codification too. Despite of all such dictated issues the traffic throughput is to be predicted with at most accuracy using a proposed multivariate analysis scheme represented as a RRSCM (Refined Regression Statistical Classifier Model that optimizes parting parameters. Henceforth, the focus is towards the measurement methodology that estimates the traffic parameters that triggers to predict the accurate traffic and extemporize the QoS for the end-users. Finally, the proposed RRSCM classification model?s end-results are compared with the ANN (Artificial Neural Network classification model to showcase its better act on the projected model

  2. Statistical Model Checking for Product Lines

    DEFF Research Database (Denmark)

    ter Beek, Maurice H.; Legay, Axel; Lluch Lafuente, Alberto

    2016-01-01

    We report on the suitability of statistical model checking for the analysis of quantitative properties of product line models by an extended treatment of earlier work by the authors. The type of analysis that can be performed includes the likelihood of specific product behaviour, the expected...... average cost of products (in terms of the attributes of the products’ features) and the probability of features to be (un)installed at runtime. The product lines must be modelled in QFLan, which extends the probabilistic feature-oriented language PFLan with novel quantitative constraints among features...... behaviour converge in a discrete-time Markov chain semantics, enabling the analysis of quantitative properties. Technically, a Maude implementation of QFLan, integrated with Microsoft’s SMT constraint solver Z3, is combined with the distributed statistical model checker MultiVeStA, developed by one...

  3. (ajst) statistical mechanics model for orientational

    African Journals Online (AJOL)

    2: December, 2005. African Journal of Science and Technology (AJST). Science and Engineering Series Vol. 6, No. 2, pp. 94 - 101. STATISTICAL MECHANICS MODEL FOR ORIENTATIONAL. MOTION OF TWO-DIMENSIONAL RIGID ROTATOR. Malo, J.O.. Department of Physics, University of Nairobi, P.O. Box 30197 ...

  4. Probing NWP model deficiencies by statistical postprocessing

    DEFF Research Database (Denmark)

    Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.

    2016-01-01

    The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational num...

  5. Topology for Statistical Modeling of Petascale Data

    Energy Technology Data Exchange (ETDEWEB)

    Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)

    2014-07-01

    This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.

  6. Statistical Validation of Engineering and Scientific Models: Background

    International Nuclear Information System (INIS)

    Hills, Richard G.; Trucano, Timothy G.

    1999-01-01

    A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the uncertainty in predictions for linear and nonlinear models. Four example applications are presented; a linear model, a model for the behavior of a damped spring-mass system, a transient thermal conduction model, and a nonlinear transient convective-diffusive model based on Burger's equation. Correlated and uncorrelated model input parameters are considered. The model validation tutorial builds on the material presented in the propagation of uncertainty tutoriaI and uses the damp spring-mass system as the example application. The validation tutorial illustrates several concepts associated with the application of statistical inference to test model predictions against experimental observations. Several validation methods are presented including error band based, multivariate, sum of squares of residuals, and optimization methods. After completion of the tutorial, a survey of statistical model validation literature is presented and recommendations for future work are made

  7. Statistical models for competing risk analysis

    International Nuclear Information System (INIS)

    Sather, H.N.

    1976-08-01

    Research results on three new models for potential applications in competing risks problems. One section covers the basic statistical relationships underlying the subsequent competing risks model development. Another discusses the problem of comparing cause-specific risk structure by competing risks theory in two homogeneous populations, P1 and P2. Weibull models which allow more generality than the Berkson and Elveback models are studied for the effect of time on the hazard function. The use of concomitant information for modeling single-risk survival is extended to the multiple failure mode domain of competing risks. The model used to illustrate the use of this methodology is a life table model which has constant hazards within pre-designated intervals of the time scale. Two parametric models for bivariate dependent competing risks, which provide interesting alternatives, are proposed and examined

  8. Performance modeling, stochastic networks, and statistical multiplexing

    CERN Document Server

    Mazumdar, Ravi R

    2013-01-01

    This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of introducing an appropriate mathematical framework for modeling and analysis as well as understanding the phenomenon of statistical multiplexing. The models, techniques, and results presented form the core of traffic engineering methods used to design, control and allocate resources in communication networks.The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the importan

  9. Statistical physics of pairwise probability models

    Directory of Open Access Journals (Sweden)

    Yasser Roudi

    2009-11-01

    Full Text Available Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models.

  10. Predictive Surface Complexation Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Sverjensky, Dimitri A. [Johns Hopkins Univ., Baltimore, MD (United States). Dept. of Earth and Planetary Sciences

    2016-11-29

    Surface complexation plays an important role in the equilibria and kinetics of processes controlling the compositions of soilwaters and groundwaters, the fate of contaminants in groundwaters, and the subsurface storage of CO2 and nuclear waste. Over the last several decades, many dozens of individual experimental studies have addressed aspects of surface complexation that have contributed to an increased understanding of its role in natural systems. However, there has been no previous attempt to develop a model of surface complexation that can be used to link all the experimental studies in order to place them on a predictive basis. Overall, my research has successfully integrated the results of the work of many experimentalists published over several decades. For the first time in studies of the geochemistry of the mineral-water interface, a practical predictive capability for modeling has become available. The predictive correlations developed in my research now enable extrapolations of experimental studies to provide estimates of surface chemistry for systems not yet studied experimentally and for natural and anthropogenically perturbed systems.

  11. Equilibrium statistical mechanics of lattice models

    CERN Document Server

    Lavis, David A

    2015-01-01

    Most interesting and difficult problems in equilibrium statistical mechanics concern models which exhibit phase transitions. For graduate students and more experienced researchers this book provides an invaluable reference source of approximate and exact solutions for a comprehensive range of such models. Part I contains background material on classical thermodynamics and statistical mechanics, together with a classification and survey of lattice models. The geometry of phase transitions is described and scaling theory is used to introduce critical exponents and scaling laws. An introduction is given to finite-size scaling, conformal invariance and Schramm—Loewner evolution. Part II contains accounts of classical mean-field methods. The parallels between Landau expansions and catastrophe theory are discussed and Ginzburg—Landau theory is introduced. The extension of mean-field theory to higher-orders is explored using the Kikuchi—Hijmans—De Boer hierarchy of approximations. In Part III the use of alge...

  12. Statistical Models of Adaptive Immune populations

    Science.gov (United States)

    Sethna, Zachary; Callan, Curtis; Walczak, Aleksandra; Mora, Thierry

    The availability of large (104-106 sequences) datasets of B or T cell populations from a single individual allows reliable fitting of complex statistical models for naïve generation, somatic selection, and hypermutation. It is crucial to utilize a probabilistic/informational approach when modeling these populations. The inferred probability distributions allow for population characterization, calculation of probability distributions of various hidden variables (e.g. number of insertions), as well as statistical properties of the distribution itself (e.g. entropy). In particular, the differences between the T cell populations of embryonic and mature mice will be examined as a case study. Comparing these populations, as well as proposed mixed populations, provides a concrete exercise in model creation, comparison, choice, and validation.

  13. Cellular automata and statistical mechanical models

    International Nuclear Information System (INIS)

    Rujan, P.

    1987-01-01

    The authors elaborate on the analogy between the transfer matrix of usual lattice models and the master equation describing the time development of cellular automata. Transient and stationary properties of probabilistic automata are linked to surface and bulk properties, respectively, of restricted statistical mechanical systems. It is demonstrated that methods of statistical physics can be successfully used to describe the dynamic and the stationary behavior of such automata. Some exact results are derived, including duality transformations, exact mappings, disorder, and linear solutions. Many examples are worked out in detail to demonstrate how to use statistical physics in order to construct cellular automata with desired properties. This approach is considered to be a first step toward the design of fully parallel, probabilistic systems whose computational abilities rely on the cooperative behavior of their components

  14. Linking statistical bias description to multiobjective model calibration

    Science.gov (United States)

    Reichert, P.; Schuwirth, N.

    2012-09-01

    In the absence of model deficiencies, simulation results at the correct parameter values lead to an unbiased description of observed data with remaining deviations due to observation errors only. However, this ideal cannot be reached in the practice of environmental modeling, because the required simplified representation of the complex reality by the model and errors in model input lead to errors that are reflected in biased model output. This leads to two related problems: First, ignoring bias of output in the statistical model description leads to bias in parameter estimates, model predictions and, in particular, in the quantification of their uncertainty. Second, as there is no objective choice of how much bias to accept in which output variable, it is not possible to design an "objective" model calibration procedure. The first of these problems has been addressed by introducing a statistical (Bayesian) description of bias, the second by suggesting the use of multiobjective calibration techniques that cannot easily be used for uncertainty analysis. We merge the ideas of these two approaches by using the prior of the statistical bias description to quantify the importance of multiple calibration objectives. This leads to probabilistic inference and prediction while still taking multiple calibration objectives into account. The ideas and technical details of the suggested approach are outlined and a didactical example as well as an application to environmental data are provided to demonstrate its practical feasibility and computational efficiency.

  15. Application of statistical classification methods for predicting the acceptability of well-water quality

    Science.gov (United States)

    Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

    2018-01-01

    The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.

  16. Candidate Prediction Models and Methods

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Madsen, Henrik

    2005-01-01

    This document lists candidate prediction models for Work Package 3 (WP3) of the PSO-project called ``Intelligent wind power prediction systems'' (FU4101). The main focus is on the models transforming numerical weather predictions into predictions of power production. The document also outlines...

  17. A new statistical scission-point model fed with microscopic ingredients to predict fission fragments distributions; Developpement d'un nouveau modele de point de scission base sur des ingredients microscopiques

    Energy Technology Data Exchange (ETDEWEB)

    Heinrich, S

    2006-07-01

    Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)

  18. Logarithmic transformed statistical models in calibration

    International Nuclear Information System (INIS)

    Zeis, C.D.

    1975-01-01

    A general type of statistical model used for calibration of instruments having the property that the standard deviations of the observed values increase as a function of the mean value is described. The application to the Helix Counter at the Rocky Flats Plant is primarily from a theoretical point of view. The Helix Counter measures the amount of plutonium in certain types of chemicals. The method described can be used also for other calibrations. (U.S.)

  19. Statistical model for high energy inclusive processes

    International Nuclear Information System (INIS)

    Pomorisac, B.

    1980-01-01

    We propose a statistical model of inclusive processes. The model is an extension of the model proposed by Salapino and Sugar for the inclusive distributions in rapidity. The model is defined in terms of a random variable on the full phase space of the produced particles and in terms of a Lorentz-invariant probability distribution. We suggest that the Lorentz invariance is broken spontaneously, this may describe the observed anisotropy of the inclusive distributions. Based on this model we calculate the distribution in transverse momentum. An explicit calculation is given of the one-particle inclusive cross sections and the two-particle correlation. The results give a fair representation of the shape of one-particle inclusive cross sections, and positive correlation for the particles emitted. The relevance of our results to experiments is discussed

  20. Spatial Statistical Network Models for Stream and River Temperature in the Chesapeake Bay Watershed, USA

    Science.gov (United States)

    Regional temperature models are needed for characterizing and mapping stream thermal regimes, establishing reference conditions, predicting future impacts and identifying critical thermal refugia. Spatial statistical models have been developed to improve regression modeling techn...

  1. Survival Predictions of Ceramic Crowns Using Statistical Fracture Mechanics.

    Science.gov (United States)

    Nasrin, S; Katsube, N; Seghi, R R; Rokhlin, S I

    2017-05-01

    This work establishes a survival probability methodology for interface-initiated fatigue failures of monolithic ceramic crowns under simulated masticatory loading. A complete 3-dimensional (3D) finite element analysis model of a minimally reduced molar crown was developed using commercially available hardware and software. Estimates of material surface flaw distributions and fatigue parameters for 3 reinforced glass-ceramics (fluormica [FM], leucite [LR], and lithium disilicate [LD]) and a dense sintered yttrium-stabilized zirconia (YZ) were obtained from the literature and incorporated into the model. Utilizing the proposed fracture mechanics-based model, crown survival probability as a function of loading cycles was obtained from simulations performed on the 4 ceramic materials utilizing identical crown geometries and loading conditions. The weaker ceramic materials (FM and LR) resulted in lower survival rates than the more recently developed higher-strength ceramic materials (LD and YZ). The simulated 10-y survival rate of crowns fabricated from YZ was only slightly better than those fabricated from LD. In addition, 2 of the model crown systems (FM and LD) were expanded to determine regional-dependent failure probabilities. This analysis predicted that the LD-based crowns were more likely to fail from fractures initiating from margin areas, whereas the FM-based crowns showed a slightly higher probability of failure from fractures initiating from the occlusal table below the contact areas. These 2 predicted fracture initiation locations have some agreement with reported fractographic analyses of failed crowns. In this model, we considered the maximum tensile stress tangential to the interfacial surface, as opposed to the more universally reported maximum principal stress, because it more directly impacts crack propagation. While the accuracy of these predictions needs to be experimentally verified, the model can provide a fundamental understanding of the

  2. Applied systems ecology: models, data, and statistical methods

    Energy Technology Data Exchange (ETDEWEB)

    Eberhardt, L L

    1976-01-01

    In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.

  3. Introduction to statistical modelling: linear regression.

    Science.gov (United States)

    Lunt, Mark

    2015-07-01

    In many studies we wish to assess how a range of variables are associated with a particular outcome and also determine the strength of such relationships so that we can begin to understand how these factors relate to each other at a population level. Ultimately, we may also be interested in predicting the outcome from a series of predictive factors available at, say, a routine clinic visit. In a recent article in Rheumatology, Desai et al. did precisely that when they studied the prediction of hip and spine BMD from hand BMD and various demographic, lifestyle, disease and therapy variables in patients with RA. This article aims to introduce the statistical methodology that can be used in such a situation and explain the meaning of some of the terms employed. It will also outline some common pitfalls encountered when performing such analyses. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Statistics and predictions of population, energy and environment problems

    International Nuclear Information System (INIS)

    Sobajima, Makoto

    1999-03-01

    In the situation that world's population, especially in developing countries, is rapidly growing, humankind is facing to global problems that they cannot steadily live unless they find individual places to live, obtain foods, and peacefully get energy necessary for living for centuries. For this purpose, humankind has to think what behavior they should take in the finite environment, talk, agree and execute. Though energy has been long respected as a symbol for improving living, demanded and used, they have come to limit the use making the global environment more serious. If there is sufficient energy not loading cost to the environment. If nuclear energy regarded as such one sustain the resource for long and has market competitiveness. What situation of realization of compensating new energy is now in the case the use of nuclear energy is restricted by the society fearing radioactivity. If there are promising ones for the future. One concerning with the study of energy cannot go without knowing these. The statistical materials compiled here are thought to be useful for that purpose, and are collected mainly from ones viewing future prediction based on past practices. Studies on the prediction is so important to have future measures that these data bases are expected to be improved for better accuracy. (author)

  5. Statistical Modelling of the Soil Dielectric Constant

    Science.gov (United States)

    Usowicz, Boguslaw; Marczewski, Wojciech; Bogdan Usowicz, Jerzy; Lipiec, Jerzy

    2010-05-01

    The dielectric constant of soil is the physical property being very sensitive on water content. It funds several electrical measurement techniques for determining the water content by means of direct (TDR, FDR, and others related to effects of electrical conductance and/or capacitance) and indirect RS (Remote Sensing) methods. The work is devoted to a particular statistical manner of modelling the dielectric constant as the property accounting a wide range of specific soil composition, porosity, and mass density, within the unsaturated water content. Usually, similar models are determined for few particular soil types, and changing the soil type one needs switching the model on another type or to adjust it by parametrization of soil compounds. Therefore, it is difficult comparing and referring results between models. The presented model was developed for a generic representation of soil being a hypothetical mixture of spheres, each representing a soil fraction, in its proper phase state. The model generates a serial-parallel mesh of conductive and capacitive paths, which is analysed for a total conductive or capacitive property. The model was firstly developed to determine the thermal conductivity property, and now it is extended on the dielectric constant by analysing the capacitive mesh. The analysis is provided by statistical means obeying physical laws related to the serial-parallel branching of the representative electrical mesh. Physical relevance of the analysis is established electrically, but the definition of the electrical mesh is controlled statistically by parametrization of compound fractions, by determining the number of representative spheres per unitary volume per fraction, and by determining the number of fractions. That way the model is capable covering properties of nearly all possible soil types, all phase states within recognition of the Lorenz and Knudsen conditions. In effect the model allows on generating a hypothetical representative of

  6. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    International Nuclear Information System (INIS)

    Nedic, Vladimir; Despotovic, Danijela; Cvetanovic, Slobodan; Despotovic, Milan; Babic, Sasa

    2014-01-01

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L eq . Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model

  7. Encoding Dissimilarity Data for Statistical Model Building.

    Science.gov (United States)

    Wahba, Grace

    2010-12-01

    We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.

  8. Physical-Statistical Model of Thermal Conductivity of Nanofluids

    Directory of Open Access Journals (Sweden)

    B. Usowicz

    2014-01-01

    Full Text Available A physical-statistical model for predicting the effective thermal conductivity of nanofluids is proposed. The volumetric unit of nanofluids in the model consists of solid, liquid, and gas particles and is treated as a system made up of regular geometric figures, spheres, filling the volumetric unit by layers. The model assumes that connections between layers of the spheres and between neighbouring spheres in the layer are represented by serial and parallel connections of thermal resistors, respectively. This model is expressed in terms of thermal resistance of nanoparticles and fluids and the multinomial distribution of particles in the nanofluids. The results for predicted and measured effective thermal conductivity of several nanofluids (Al2O3/ethylene glycol-based and Al2O3/water-based; CuO/ethylene glycol-based and CuO/water-based; and TiO2/ethylene glycol-based are presented. The physical-statistical model shows a reasonably good agreement with the experimental results and gives more accurate predictions for the effective thermal conductivity of nanofluids compared to existing classical models.

  9. Comparison and validation of statistical methods for predicting power outage durations in the event of hurricanes.

    Science.gov (United States)

    Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M

    2011-12-01

    This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.

  10. Statistical Model Checking for Biological Systems

    DEFF Research Database (Denmark)

    David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel

    2014-01-01

    Statistical Model Checking (SMC) is a highly scalable simulation-based verification approach for testing and estimating the probability that a stochastic system satisfies a given linear temporal property. The technique has been applied to (discrete and continuous time) Markov chains, stochastic...... timed automata and most recently hybrid systems using the tool Uppaal SMC. In this paper we enable the application of SMC to complex biological systems, by combining Uppaal SMC with ANIMO, a plugin of the tool Cytoscape used by biologists, as well as with SimBiology®, a plugin of Matlab to simulate...

  11. Average Nuclear properties based on statistical model

    International Nuclear Information System (INIS)

    El-Jaick, L.J.

    1974-01-01

    The rough properties of nuclei were investigated by statistical model, in systems with the same and different number of protons and neutrons, separately, considering the Coulomb energy in the last system. Some average nuclear properties were calculated based on the energy density of nuclear matter, from Weizsscker-Beth mass semiempiric formulae, generalized for compressible nuclei. In the study of a s surface energy coefficient, the great influence exercised by Coulomb energy and nuclear compressibility was verified. For a good adjust of beta stability lines and mass excess, the surface symmetry energy were established. (M.C.K.) [pt

  12. Statistical modelling of fine red wine production

    Directory of Open Access Journals (Sweden)

    María Rosa Castro

    2010-01-01

    Full Text Available Producing wine is a very important economic activity in the province of San Juan in Argentina; it is therefore most important to predict production regarding the quantity of raw material needed. This work was aimed at obtaining a model relating kilograms of crushed grape to the litres of wine so produced. Such model will be used for predicting precise future values and confidence intervals for determined quantities of crushed grapes. Data from a vineyard in the province of San Juan was thus used in this work. The sampling coefficient of correlation was calculated and a dispersion diagram was then constructed; this indicated a li- neal relationship between the litres of wine obtained and the kilograms of crushed grape. Two lineal models were then adopted and variance analysis was carried out because the data came from normal populations having the same variance. The most appropriate model was obtained from this analysis; it was validated with experimental values, a good approach being obtained.

  13. Comparison of statistical and clinical predictions of functional outcome after ischemic stroke.

    Directory of Open Access Journals (Sweden)

    Douglas D Thompson

    Full Text Available To determine whether the predictions of functional outcome after ischemic stroke made at the bedside using a doctor's clinical experience were more or less accurate than the predictions made by clinical prediction models (CPMs.A prospective cohort study of nine hundred and thirty one ischemic stroke patients recruited consecutively at the outpatient, inpatient and emergency departments of the Western General Hospital, Edinburgh between 2002 and 2005. Doctors made informal predictions of six month functional outcome on the Oxford Handicap Scale (OHS. Patients were followed up at six months with a validated postal questionnaire. For each patient we calculated the absolute predicted risk of death or dependence (OHS≥3 using five previously described CPMs. The specificity of a doctor's informal predictions of OHS≥3 at six months was good 0.96 (95% CI: 0.94 to 0.97 and similar to CPMs (range 0.94 to 0.96; however the sensitivity of both informal clinical predictions 0.44 (95% CI: 0.39 to 0.49 and clinical prediction models (range 0.38 to 0.45 was poor. The prediction of the level of disability after stroke was similar for informal clinical predictions (ordinal c-statistic 0.74 with 95% CI 0.72 to 0.76 and CPMs (range 0.69 to 0.75. No patient or clinician characteristic affected the accuracy of informal predictions, though predictions were more accurate in outpatients.CPMs are at least as good as informal clinical predictions in discriminating between good and bad functional outcome after ischemic stroke. The place of these models in clinical practice has yet to be determined.

  14. Multi-Scale Statistical Evaluation of CMIP5 Predictions of Extreme Precipitation Events

    Science.gov (United States)

    Allen, M. R.; Fu, J. S.; Gao, Y.; Drake, J.; Lamarque, J.

    2012-12-01

    In view of risk to critical infrastructure, many decision makers are concerned about extreme precipitation potential at specific locations; yet modeling of precipitation at local, regional and global scale remains uncertain and in many regions, problematic. We examine both physical and statistical strategies for improvement in predictions at global and regional scale. Various probability distribution functions are fit to extreme precipitation data such as National Climatic Data Center (NCDC) observational data, National Aeronautics and Space Administration (NASA) satellite observations and location-specific point gauge measurements in order to determine which statistical method gives the best predictive values for past observed extremes for a given location at various resolutions. While the Log Pearson III distribution shows better return-time extreme precipitation predictive capability than the Gumbel, or Type I Extreme Value Distribution for specific locations at coarse resolution (2x2.5 degree), physically-based regionally- and seasonally-informed methods incorporating both time and space parameters may better suit local and short term intensity predictions. Methods applied to Coupled Model Intercomparison Project Phase 5 (CMIP5) climate model historical data determine model capability to capture not only precipitation averages or maxima but also multi-decadal probability distributions. Uncertainty quantification of global and dynamically-downscaled Representative Concentration Pathway (RCP) 4.5 and 8.5 scenario outputs derived from these methods serve accordingly to determine regional and local risk predictions associated with climate change.

  15. Comparison of four statistical and machine learning methods for crash severity prediction.

    Science.gov (United States)

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Predictive models for arteriovenous fistula maturation.

    Science.gov (United States)

    Al Shakarchi, Julien; McGrogan, Damian; Van der Veer, Sabine; Sperrin, Matthew; Inston, Nicholas

    2016-05-07

    Haemodialysis (HD) is a lifeline therapy for patients with end-stage renal disease (ESRD). A critical factor in the survival of renal dialysis patients is the surgical creation of vascular access, and international guidelines recommend arteriovenous fistulas (AVF) as the gold standard of vascular access for haemodialysis. Despite this, AVFs have been associated with high failure rates. Although risk factors for AVF failure have been identified, their utility for predicting AVF failure through predictive models remains unclear. The objectives of this review are to systematically and critically assess the methodology and reporting of studies developing prognostic predictive models for AVF outcomes and assess them for suitability in clinical practice. Electronic databases were searched for studies reporting prognostic predictive models for AVF outcomes. Dual review was conducted to identify studies that reported on the development or validation of a model constructed to predict AVF outcome following creation. Data were extracted on study characteristics, risk predictors, statistical methodology, model type, as well as validation process. We included four different studies reporting five different predictive models. Parameters identified that were common to all scoring system were age and cardiovascular disease. This review has found a small number of predictive models in vascular access. The disparity between each study limits the development of a unified predictive model.

  17. Statistical approach to predict compressive strength of high workability slag-cement mortars

    International Nuclear Information System (INIS)

    Memon, N.A.; Memon, N.A.; Sumadi, S.R.

    2009-01-01

    This paper reports an attempt made to develop empirical expressions to estimate/ predict the compressive strength of high workability slag-cement mortars. Experimental data of 54 mix mortars were used. The mortars were prepared with slag as cement replacement of the order of 0, 50 and 60%. The flow (workability) was maintained at 136+-3%. The numerical and statistical analysis was performed by using database computer software Microsoft Office Excel 2003. Three empirical mathematical models were developed to estimate/predict 28 days compressive strength of high workability slag cement-mortars with 0, 50 and 60% slag which predict the values accurate between 97 and 98%. Finally a generalized empirical mathematical model was proposed which can predict 28 days compressive strength of high workability mortars up to degree of accuracy 95%. (author)

  18. Predicting energy performance of a net-zero energy building: A statistical approach

    International Nuclear Information System (INIS)

    Kneifel, Joshua; Webb, David

    2016-01-01

    Highlights: • A regression model is applied to actual energy data from a net-zero energy building. • The model is validated through a rigorous statistical analysis. • Comparisons are made between model predictions and those of a physics-based model. • The model is a viable baseline for evaluating future models from the energy data. - Abstract: Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid Climate Zone, and compares these

  19. Massive Predictive Modeling using Oracle R Enterprise

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    R is fast becoming the lingua franca for analyzing data via statistics, visualization, and predictive analytics. For enterprise-scale data, R users have three main concerns: scalability, performance, and production deployment. Oracle's R-based technologies - Oracle R Distribution, Oracle R Enterprise, Oracle R Connector for Hadoop, and the R package ROracle - address these concerns. In this talk, we introduce Oracle's R technologies, highlighting how each enables R users to achieve scalability and performance while making production deployment of R results a natural outcome of the data analyst/scientist efforts. The focus then turns to Oracle R Enterprise with code examples using the transparency layer and embedded R execution, targeting massive predictive modeling. One goal behind massive predictive modeling is to build models per entity, such as customers, zip codes, simulations, in an effort to understand behavior and tailor predictions at the entity level. Predictions...

  20. MSMBuilder: Statistical Models for Biomolecular Dynamics.

    Science.gov (United States)

    Harrigan, Matthew P; Sultan, Mohammad M; Hernández, Carlos X; Husic, Brooke E; Eastman, Peter; Schwantes, Christian R; Beauchamp, Kyle A; McGibbon, Robert T; Pande, Vijay S

    2017-01-10

    MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  1. Spatial Economics Model Predicting Transport Volume

    Directory of Open Access Journals (Sweden)

    Lu Bo

    2016-10-01

    Full Text Available It is extremely important to predict the logistics requirements in a scientific and rational way. However, in recent years, the improvement effect on the prediction method is not very significant and the traditional statistical prediction method has the defects of low precision and poor interpretation of the prediction model, which cannot only guarantee the generalization ability of the prediction model theoretically, but also cannot explain the models effectively. Therefore, in combination with the theories of the spatial economics, industrial economics, and neo-classical economics, taking city of Zhuanghe as the research object, the study identifies the leading industry that can produce a large number of cargoes, and further predicts the static logistics generation of the Zhuanghe and hinterlands. By integrating various factors that can affect the regional logistics requirements, this study established a logistics requirements potential model from the aspect of spatial economic principles, and expanded the way of logistics requirements prediction from the single statistical principles to an new area of special and regional economics.

  2. Statistical Shape Modeling of Cam Femoroacetabular Impingement

    Energy Technology Data Exchange (ETDEWEB)

    Harris, Michael D.; Dater, Manasi; Whitaker, Ross; Jurrus, Elizabeth R.; Peters, Christopher L.; Anderson, Andrew E.

    2013-10-01

    In this study, statistical shape modeling (SSM) was used to quantify three-dimensional (3D) variation and morphologic differences between femurs with and without cam femoroacetabular impingement (FAI). 3D surfaces were generated from CT scans of femurs from 41 controls and 30 cam FAI patients. SSM correspondence particles were optimally positioned on each surface using a gradient descent energy function. Mean shapes for control and patient groups were defined from the resulting particle configurations. Morphological differences between group mean shapes and between the control mean and individual patients were calculated. Principal component analysis was used to describe anatomical variation present in both groups. The first 6 modes (or principal components) captured statistically significant shape variations, which comprised 84% of cumulative variation among the femurs. Shape variation was greatest in femoral offset, greater trochanter height, and the head-neck junction. The mean cam femur shape protruded above the control mean by a maximum of 3.3 mm with sustained protrusions of 2.5-3.0 mm along the anterolateral head-neck junction and distally along the anterior neck, corresponding well with reported cam lesion locations and soft-tissue damage. This study provides initial evidence that SSM can describe variations in femoral morphology in both controls and cam FAI patients and may be useful for developing new measurements of pathological anatomy. SSM may also be applied to characterize cam FAI severity and provide templates to guide patient-specific surgical resection of bone.

  3. Statistical mechanics far from equilibrium: prediction and test for a sheared system.

    Science.gov (United States)

    Evans, R M L; Simha, R A; Baule, A; Olmsted, P D

    2010-05-01

    We report the application of a far-from-equilibrium statistical-mechanical theory to a nontrivial system with Newtonian interactions in continuous boundary-driven flow. By numerically time stepping the force-balance equations of a one-dimensional model fluid we measure occupancies and transition rates in simulation. The high-shear-rate simulation data reproduce the predicted invariant quantities, thus supporting the theory that a class of nonequilibrium steady states of matter, namely, sheared complex fluids, is amenable to statistical treatment from first principles.

  4. PREDICTIVE CAPACITY OF ARCH FAMILY MODELS

    Directory of Open Access Journals (Sweden)

    Raphael Silveira Amaro

    2016-03-01

    Full Text Available In the last decades, a remarkable number of models, variants from the Autoregressive Conditional Heteroscedastic family, have been developed and empirically tested, making extremely complex the process of choosing a particular model. This research aim to compare the predictive capacity, using the Model Confidence Set procedure, than five conditional heteroskedasticity models, considering eight different statistical probability distributions. The financial series which were used refers to the log-return series of the Bovespa index and the Dow Jones Industrial Index in the period between 27 October 2008 and 30 December 2014. The empirical evidences showed that, in general, competing models have a great homogeneity to make predictions, either for a stock market of a developed country or for a stock market of a developing country. An equivalent result can be inferred for the statistical probability distributions that were used.

  5. Statistical model for OCT image denoising

    KAUST Repository

    Li, Muxingzi

    2017-08-01

    Optical coherence tomography (OCT) is a non-invasive technique with a large array of applications in clinical imaging and biological tissue visualization. However, the presence of speckle noise affects the analysis of OCT images and their diagnostic utility. In this article, we introduce a new OCT denoising algorithm. The proposed method is founded on a numerical optimization framework based on maximum-a-posteriori estimate of the noise-free OCT image. It combines a novel speckle noise model, derived from local statistics of empirical spectral domain OCT (SD-OCT) data, with a Huber variant of total variation regularization for edge preservation. The proposed approach exhibits satisfying results in terms of speckle noise reduction as well as edge preservation, at reduced computational cost.

  6. Current algebra, statistical mechanics and quantum models

    Science.gov (United States)

    Vilela Mendes, R.

    2017-11-01

    Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.

  7. Predicting Automotive Interior Noise Including Wind Noise by Statistical Energy Analysis

    OpenAIRE

    Yoshio Kurosawa

    2017-01-01

    The applications of soundproof materials for reduction of high frequency automobile interior noise have been researched. This paper presents a sound pressure prediction technique including wind noise by Hybrid Statistical Energy Analysis (HSEA) in order to reduce weight of acoustic insulations. HSEA uses both analytical SEA and experimental SEA. As a result of chassis dynamo test and road test, the validity of SEA modeling was shown, and utility of the method was confirmed.

  8. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    Science.gov (United States)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  9. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  10. Combining Statistical and Ensemble Streamflow Predictions to Cope with Consensus Forecast

    Science.gov (United States)

    Mirfenderesgi, G.; Najafi, M.; Moradkhani, H.

    2012-12-01

    Monthly and seasonal water supply outlooks are used for water resource planning and management including the industrial and agriculture water allocation as well as reservoir operations. Currently consensus forecasts are jointly issued by the operational agencies in the Western US based on statistical regression equations and ensemble streamflow predictions. However, an objective method is needed to combine the forecasts from these methods. In this study monthly and seasonal streamflow predictions are generated from various hydrologic and statistical simulations including: Variable Infiltration Capacity (VIC), Sacramento Soil Moisture Accounting Model (SAC-SMA), Precipitation Runoff Modeling System (PRMS), Conceptual Hydrologic MODel (HYMOD), and Principal and Independent Component Regression (PCR and ICR), etc. The results are optimally combined by several objective multi-modeling methods. The increase in forecast accuracy is assessed in comparison with the available best and worst prediction. The precision of each multi-model method is also estimated. The study is performed over the Lake Granby, located in the headwaters of the Colorado River Basin. Overall the results show improvements in both monthly and seasonal forecasts as compared with single model simulations.

  11. New advances in statistical modeling and applications

    CERN Document Server

    Santos, Rui; Oliveira, Maria; Paulino, Carlos

    2014-01-01

    This volume presents selected papers from the XIXth Congress of the Portuguese Statistical Society, held in the town of Nazaré, Portugal, from September 28 to October 1, 2011. All contributions were selected after a thorough peer-review process. It covers a broad range of papers in the areas of statistical science, probability and stochastic processes, extremes and statistical applications.

  12. A statistical predicting scheme of the intensity of the sth over the western North Pacific

    Science.gov (United States)

    Jin, Yiming; Cai, Jinxiang; Liu, Ningsheng

    1987-03-01

    By performing error analysis of the information from the 48-hr forecasting charts of the 500-hPa fields by the B model over eastern Asia in the period of July to September 1982 and expansions of the height fields of westerlies and the subtropical zone by use of the Chebyshev polynomial and EOF, respectively, a scheme is developed for predicting the synchronous STH coefficient (i. e. time coefficient) in terms of the Chebyshev one, thus making possible statistical forecasting of the 500-hPa subtropical field within 48 hr. Tests with independent samples indicate that, to a certain extent, this scheme can be used in operational prediction as a reference.

  13. Are distance-dependent statistical potentials considering three interacting bodies superior to two-body statistical potentials for protein structure prediction?

    Science.gov (United States)

    Ghomi, Hamed Tabatabaei; Thompson, Jared J; Lill, Markus A

    2014-10-01

    Distance-based statistical potentials have long been used to model condensed matter systems, e.g. as scoring functions in differentiating native-like protein structures from decoys. These scoring functions are based on the assumption that the total free energy of the protein can be calculated as the sum of pairwise free energy contributions derived from a statistical analysis of pair-distribution functions. However, this fundamental assumption has been challenged theoretically. In fact the free energy of a system with N particles is only exactly related to the N-body distribution function. Based on this argument coarse-grained multi-body statistical potentials have been developed to capture higher-order interactions. Having a coarse representation of the protein and using geometric contacts instead of pairwise interaction distances renders these models insufficient in modeling details of multi-body effects. In this study, we investigated if extending distance-dependent pairwise atomistic statistical potentials to corresponding interaction functions that are conditional on a third interacting body, defined as quasi-three-body statistical potentials, could model details of three-body interactions. We also tested if this approach could improve the predictive capabilities of statistical scoring functions for protein structure prediction. We analyzed the statistical dependency between two simultaneous pairwise interactions and showed that there is surprisingly little if any dependency of a third interacting site on pairwise atomistic statistical potentials. Also the protein structure prediction performance of these quasi-three-body potentials is comparable with their corresponding two-body counterparts. The scoring functions developed in this study showed better or comparable performances compared to some widely used scoring functions for protein structure prediction.

  14. A statistical analysis based recommender model for heart disease patients.

    Science.gov (United States)

    Mustaqeem, Anam; Anwar, Syed Muhammad; Khan, Abdul Rashid; Majid, Muhammad

    2017-12-01

    An intelligent information technology based system could have a positive impact on the life-style of patients suffering from chronic diseases by providing useful health recommendations. In this paper, we have proposed a hybrid model that provides disease prediction and medical recommendations to cardiac patients. The first part aims at implementing a prediction model, that can identify the disease of a patient and classify it into one of the four output classes i.e., non-cardiac chest pain, silent ischemia, angina, and myocardial infarction. Following the disease prediction, the second part of the model provides general medical recommendations to patients. The recommendations are generated by assessing the severity of clinical features of patients, estimating the risk associated with clinical features and disease, and calculating the probability of occurrence of disease. The purpose of this model is to build an intelligent and adaptive recommender system for heart disease patients. The experiments for the proposed recommender system are conducted on a clinical data set collected and labelled in consultation with medical experts from a known hospital. The performance of the proposed prediction model is evaluated using accuracy and kappa statistics as evaluation measures. The medical recommendations are generated based on information collected from a knowledge base created with the help of physicians. The results of the recommendation model are evaluated using confusion matrix and gives an accuracy of 97.8%. The proposed system exhibits good prediction and recommendation accuracies and promises to be a useful contribution in the field of e-health and medical informatics. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Spatio-temporal statistical models with applications to atmospheric processes

    International Nuclear Information System (INIS)

    Wikle, C.K.

    1996-01-01

    This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model

  16. Spatio-temporal statistical models with applications to atmospheric processes

    Energy Technology Data Exchange (ETDEWEB)

    Wikle, Christopher K. [Iowa State Univ., Ames, IA (United States)

    1996-01-01

    This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model.

  17. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    Directory of Open Access Journals (Sweden)

    Abut F

    2015-08-01

    Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance

  18. Statistical Models for Inferring Vegetation Composition from Fossil Pollen

    Science.gov (United States)

    Paciorek, C.; McLachlan, J. S.; Shang, Z.

    2011-12-01

    Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.

  19. Statistical modeling of global geogenic fluoride contamination in groundwaters.

    Science.gov (United States)

    Amini, Manouchehr; Mueller, Kim; Abbaspour, Karim C; Rosenberg, Thomas; Afyuni, Majid; Møller, Klaus N; Sarr, Mamadou; Johnson, C Annette

    2008-05-15

    The use of groundwater with high fluoride concentrations poses a health threat to millions of people around the world. This study aims at providing a global overview of potentially fluoride-rich groundwaters by modeling fluoride concentration. A large database of worldwide fluoride concentrations as well as available information on related environmental factors such as soil properties, geological settings, and climatic and topographical information on a global scale have all been used in the model. The modeling approach combines geochemical knowledge with statistical methods to devise a rule-based statistical procedure, which divides the world into 8 different "process regions". For each region a separate predictive model was constructed. The end result is a global probability map of fluoride concentration in the groundwater. Comparisons of the modeled and measured data indicate that 60-70% of the fluoride variation could be explained by the models in six process regions, while in two process regions only 30% of the variation in the measured data was explained. Furthermore, the global probability map corresponded well with fluorotic areas described in the international literature. Although the probability map should not replace fluoride testing, it can give a first indication of possible contamination and thus may support the planning process of new drinking water projects.

  20. Energy based prediction models for building acoustics

    DEFF Research Database (Denmark)

    Brunskog, Jonas

    2012-01-01

    In order to reach robust and simplified yet accurate prediction models, energy based principle are commonly used in many fields of acoustics, especially in building acoustics. This includes simple energy flow models, the framework of statistical energy analysis (SEA) as well as more elaborated...... principles as, e.g., wave intensity analysis (WIA). The European standards for building acoustic predictions, the EN 12354 series, are based on energy flow and SEA principles. In the present paper, different energy based prediction models are discussed and critically reviewed. Special attention is placed...... on underlying basic assumptions, such as diffuse fields, high modal overlap, resonant field being dominant, etc., and the consequences of these in terms of limitations in the theory and in the practical use of the models....

  1. Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.

    Science.gov (United States)

    Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M

    2015-07-01

    Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.

  2. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    International Nuclear Information System (INIS)

    Weathers, J.B.; Luck, R.; Weathers, J.W.

    2009-01-01

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  3. MODEL PREDICTIVE CONTROL FUNDAMENTALS

    African Journals Online (AJOL)

    2012-07-02

    Jul 2, 2012 ... Linear MPC. 1. Uses linear model: ˙x = Ax + Bu. 2. Quadratic cost function: F = xT Qx + uT Ru. 3. Linear constraints: Hx + Gu < 0. 4. Quadratic program. Nonlinear MPC. 1. Nonlinear model: ˙x = f(x, u). 2. Cost function can be nonquadratic: F = (x, u). 3. Nonlinear constraints: h(x, u) < 0. 4. Nonlinear program.

  4. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia

    2017-11-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.

  5. Statistical Learning Theory: Models, Concepts, and Results

    OpenAIRE

    von Luxburg, Ulrike; Schoelkopf, Bernhard

    2008-01-01

    Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.

  6. Online Statistical Modeling (Regression Analysis) for Independent Responses

    Science.gov (United States)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  7. Predictive models of moth development

    Science.gov (United States)

    Degree-day models link ambient temperature to insect life-stages, making such models valuable tools in integrated pest management. These models increase management efficacy by predicting pest phenology. In Wisconsin, the top insect pest of cranberry production is the cranberry fruitworm, Acrobasis v...

  8. Predictive Models and Computational Embryology

    Science.gov (United States)

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  9. Models for predicting compressive strength and water absorption of ...

    African Journals Online (AJOL)

    This work presents a mathematical model for predicting the compressive strength and water absorption of laterite-quarry dust cement block using augmented Scheffe's simplex lattice design. The statistical models developed can predict the mix proportion that will yield the desired property. The models were tested for lack of ...

  10. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    Science.gov (United States)

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.

  11. PGT: A Statistical Approach to Prediction and Mechanism Design

    Science.gov (United States)

    Wolpert, David H.; Bono, James W.

    One of the biggest challenges facing behavioral economics is the lack of a single theoretical framework that is capable of directly utilizing all types of behavioral data. One of the biggest challenges of game theory is the lack of a framework for making predictions and designing markets in a manner that is consistent with the axioms of decision theory. An approach in which solution concepts are distribution-valued rather than set-valued (i.e. equilibrium theory) has both capabilities. We call this approach Predictive Game Theory (or PGT). This paper outlines a general Bayesian approach to PGT. It also presents one simple example to illustrate the way in which this approach differs from equilibrium approaches in both prediction and mechanism design settings.

  12. statistical prediction of gully erosion development on the coastal ...

    African Journals Online (AJOL)

    Dr Obe

    form Linear Discriminant Function (LDF). Three functions were obtained by combining the variables in three different ways. An application of the three functions to the field situation identified function 1,γ1 as a very comfortable prediction. When Yl was used to classify the various sites using the variables obtained from the ...

  13. Predicting Mobility using Statistics (PreMoStat)

    Science.gov (United States)

    2011-03-10

    Improved Dynamics Model – Phase 1 PackBot psuedo-track model was a set of cascaded wheels – Cleats engaging/disengaging with terrain introduced bouncing...constraints Segmented Track 36 segments Phase 1 Psuedo-Track Model Phase 2 Full-Track Model Pseudo-track , - Parameterized - cleat height, taper angle...etc. - Spring/damper system used to constrain motion of track segments relative to one another - Cleats were attached to a set of cascaded wheels

  14. Linear Mixed Models in Statistical Genetics

    NARCIS (Netherlands)

    R. de Vlaming (Ronald)

    2017-01-01

    markdownabstractOne of the goals of statistical genetics is to elucidate the genetic architecture of phenotypes (i.e., observable individual characteristics) that are affected by many genetic variants (e.g., single-nucleotide polymorphisms; SNPs). A particular aim is to identify specific SNPs that

  15. Stochastic Spatial Models in Ecology: A Statistical Physics Approach

    Science.gov (United States)

    Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.

    2017-11-01

    Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.

  16. Predictions models with neural nets

    Directory of Open Access Journals (Sweden)

    Vladimír Konečný

    2008-01-01

    Full Text Available The contribution is oriented to basic problem trends solution of economic pointers, using neural networks. Problems include choice of the suitable model and consequently configuration of neural nets, choice computational function of neurons and the way prediction learning. The contribution contains two basic models that use structure of multilayer neural nets and way of determination their configuration. It is postulate a simple rule for teaching period of neural net, to get most credible prediction.Experiments are executed with really data evolution of exchange rate Kč/Euro. The main reason of choice this time series is their availability for sufficient long period. In carry out of experiments the both given basic kind of prediction models with most frequent use functions of neurons are verified. Achieve prediction results are presented as in numerical and so in graphical forms.

  17. Foundations of Complex Systems Nonlinear Dynamics, Statistical Physics, and Prediction

    CERN Document Server

    Nicolis, Gregoire

    2007-01-01

    Complexity is emerging as a post-Newtonian paradigm for approaching a large body of phenomena of concern at the crossroads of physical, engineering, environmental, life and human sciences from a unifying point of view. This book outlines the foundations of modern complexity research as it arose from the cross-fertilization of ideas and tools from nonlinear science, statistical physics and numerical simulation. It is shown how these developments lead to an understanding, both qualitative and quantitative, of the complex systems encountered in nature and in everyday experience and, conversely, h

  18. Statistical models and methods for reliability and survival analysis

    CERN Document Server

    Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo

    2013-01-01

    Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical

  19. Towards a Statistical Model of Tropical Cyclone Genesis

    Science.gov (United States)

    Fernandez, A.; Kashinath, K.; McAuliffe, J.; Prabhat, M.; Stark, P. B.; Wehner, M. F.

    2017-12-01

    Tropical Cyclones (TCs) are important extreme weather phenomena that have a strong impact on humans. TC forecasts are largely based on global numerical models that produce TC-like features. Aspects of Tropical Cyclones such as their formation/genesis, evolution, intensification and dissipation over land are important and challenging problems in climate science. This study investigates the environmental conditions associated with Tropical Cyclone Genesis (TCG) by testing how accurately a statistical model can predict TCG in the CAM5.1 climate model. TCG events are defined using TECA software @inproceedings{Prabhat2015teca, title={TECA: Petascale Pattern Recognition for Climate Science}, author={Prabhat and Byna, Surendra and Vishwanath, Venkatram and Dart, Eli and Wehner, Michael and Collins, William D}, booktitle={Computer Analysis of Images and Patterns}, pages={426-436}, year={2015}, organization={Springer}} to extract TC trajectories from CAM5.1. L1-regularized logistic regression (L1LR) is applied to the CAM5.1 output. The predictions have nearly perfect accuracy for data not associated with TC tracks and high accuracy differentiating between high vorticity and low vorticity systems. The model's active variables largely correspond to current hypotheses about important factors for TCG, such as wind field patterns and local pressure minima, and suggests new routes for investigation. Furthermore, our model's predictions of TC activity are competitive with the output of an instantaneous version of Emanuel and Nolan's Genesis Potential Index (GPI) @inproceedings{eman04, title = "Tropical cyclone activity and the global climate system", author = "Kerry Emanuel and Nolan, {David S.}", year = "2004", pages = "240-241", booktitle = "26th Conference on Hurricanes and Tropical Meteorology"}.

  20. MASKED AREAS IN SHEAR PEAK STATISTICS: A FORWARD MODELING APPROACH

    Energy Technology Data Exchange (ETDEWEB)

    Bard, D. [KIPAC, SLAC National Accelerator Laboratory, 2575 Sand Hill Rd, Menlo Park, CA 94025 (United States); Kratochvil, J. M. [Astrophysics and Cosmology Research Unit, University of KwaZulu-Natal, Westville, Durban 4000 (South Africa); Dawson, W., E-mail: djbard@slac.stanford.edu [Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA 94550 (United States)

    2016-03-10

    The statistics of shear peaks have been shown to provide valuable cosmological information beyond the power spectrum, and will be an important constraint of models of cosmology in forthcoming astronomical surveys. Surveys include masked areas due to bright stars, bad pixels etc., which must be accounted for in producing constraints on cosmology from shear maps. We advocate a forward-modeling approach, where the impacts of masking and other survey artifacts are accounted for in the theoretical prediction of cosmological parameters, rather than correcting survey data to remove them. We use masks based on the Deep Lens Survey, and explore the impact of up to 37% of the survey area being masked on LSST and DES-scale surveys. By reconstructing maps of aperture mass the masking effect is smoothed out, resulting in up to 14% smaller statistical uncertainties compared to simply reducing the survey area by the masked area. We show that, even in the presence of large survey masks, the bias in cosmological parameter estimation produced in the forward-modeling process is ≈1%, dominated by bias caused by limited simulation volume. We also explore how this potential bias scales with survey area and evaluate how much small survey areas are impacted by the differences in cosmological structure in the data and simulated volumes, due to cosmic variance.

  1. REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH

    Directory of Open Access Journals (Sweden)

    R. SATISHKUMAR

    2017-01-01

    Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.

  2. Geometric modeling in probability and statistics

    CERN Document Server

    Calin, Ovidiu

    2014-01-01

    This book covers topics of Informational Geometry, a field which deals with the differential geometric study of the manifold probability density functions. This is a field that is increasingly attracting the interest of researchers from many different areas of science, including mathematics, statistics, geometry, computer science, signal processing, physics and neuroscience. It is the authors’ hope that the present book will be a valuable reference for researchers and graduate students in one of the aforementioned fields. This textbook is a unified presentation of differential geometry and probability theory, and constitutes a text for a course directed at graduate or advanced undergraduate students interested in applications of differential geometry in probability and statistics. The book contains over 100 proposed exercises meant to help students deepen their understanding, and it is accompanied by software that is able to provide numerical computations of several information geometric objects. The reader...

  3. Statistical Model Checking of Rich Models and Properties

    DEFF Research Database (Denmark)

    Poulsen, Danny Bøgsted

    Software is in increasing fashion embedded within safety- and business critical processes of society. Errors in these embedded systems can lead to human casualties or severe monetary loss. Model checking technology has proven formal methods capable of finding and correcting errors in software...... motivates why existing model checking technology should be supplemented by new techniques. It also contains a brief introduction to probability theory and concepts covered by the six papers making up the second part. The first two papers are concerned with developing online monitoring techniques...... systems. The fifth paper shows how stochastic hybrid automata are useful for modelling biological systems and the final paper is concerned with showing how statistical model checking is efficiently distributed. In parallel with developing the theory contained in the papers, a substantial part of this work...

  4. How accurate and statistically robust are catalytic site predictions based on closeness centrality?

    Directory of Open Access Journals (Sweden)

    Livesay Dennis R

    2007-05-01

    Full Text Available Abstract Background We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex i and all other vertices. Results We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined. Conclusion Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically

  5. A statistical model of future human actions

    International Nuclear Information System (INIS)

    Woo, G.

    1992-02-01

    A critical review has been carried out of models of future human actions during the long term post-closure period of a radioactive waste repository. Various Markov models have been considered as alternatives to the standard Poisson model, and the problems of parameterisation have been addressed. Where the simplistic Poisson model unduly exaggerates the intrusion risk, some form of Markov model may have to be introduced. This situation may well arise for shallow repositories, but it is less likely for deep repositories. Recommendations are made for a practical implementation of a computer based model and its associated database. (Author)

  6. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  7. Using historical vital statistics to predict the distribution of under-five mortality by cause.

    Science.gov (United States)

    Rao, Chalapati; Adair, Timothy; Kinfu, Yohannes

    2011-06-01

    Cause-specific mortality data is essential for planning intervention programs to reduce mortality in the under age five years population (under-five). However, there is a critical paucity of such information for most of the developing world, particularly where progress towards the United Nations Millennium Development Goal 4 (MDG 4) has been slow. This paper presents a predictive cause of death model for under-five mortality based on historical vital statistics and discusses the utility of the model in generating information that could accelerate progress towards MDG 4. Over 1400 country years of vital statistics from 34 countries collected over a period of nearly a century were analyzed to develop relationships between levels of under-five mortality, related mortality ratios, and proportionate mortality from four cause groups: perinatal conditions; diarrhea and lower respiratory infections; congenital anomalies; and all other causes of death. A system of multiple equations with cross-equation parameter restrictions and correlated error terms was developed to predict proportionate mortality by cause based on given measures of under-five mortality. The strength of the predictive model was tested through internal and external cross-validation techniques. Modeled cause-specific mortality estimates for major regions in Africa, Asia, Central America, and South America are presented to illustrate its application across a range of under-five mortality rates. Consistent and plausible trends and relationships are observed from historical data. High mortality rates are associated with increased proportions of deaths from diarrhea and lower respiratory infections. Perinatal conditions assume importance as a proportionate cause at under-five mortality rates below 60 per 1000 live births. Internal and external validation confirms strength and consistency of the predictive model. Model application at regional level demonstrates heterogeneity and non-linearity in cause

  8. Statistical models of shape optimisation and evaluation

    CERN Document Server

    Davies, Rhodri; Taylor, Chris

    2014-01-01

    Deformable shape models have wide application in computer vision and biomedical image analysis. This book addresses a key issue in shape modelling: establishment of a meaningful correspondence between a set of shapes. Full implementation details are provided.

  9. Enhanced surrogate models for statistical design exploiting space mapping technology

    DEFF Research Database (Denmark)

    Koziel, Slawek; Bandler, John W.; Mohamed, Achmed S.

    2005-01-01

    We present advances in microwave and RF device modeling exploiting Space Mapping (SM) technology. We propose new SM modeling formulations utilizing input mappings, output mappings, frequency scaling and quadratic approximations. Our aim is to enhance circuit models for statistical analysis...

  10. Prediction of monthly average global solar radiation based on statistical distribution of clearness index

    International Nuclear Information System (INIS)

    Ayodele, T.R.; Ogunjuyigbe, A.S.O.

    2015-01-01

    In this paper, probability distribution of clearness index is proposed for the prediction of global solar radiation. First, the clearness index is obtained from the past data of global solar radiation, then, the parameters of the appropriate distribution that best fit the clearness index are determined. The global solar radiation is thereafter predicted from the clearness index using inverse transformation of the cumulative distribution function. To validate the proposed method, eight years global solar radiation data (2000–2007) of Ibadan, Nigeria are used to determine the parameters of appropriate probability distribution for clearness index. The calculated parameters are then used to predict the future monthly average global solar radiation for the following year (2008). The predicted values are compared with the measured values using four statistical tests: the Root Mean Square Error (RMSE), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error) and the coefficient of determination (R 2 ). The proposed method is also compared to the existing regression models. The results show that logistic distribution provides the best fit for clearness index of Ibadan and the proposed method is effective in predicting the monthly average global solar radiation with overall RMSE of 0.383 MJ/m 2 /day, MAE of 0.295 MJ/m 2 /day, MAPE of 2% and R 2 of 0.967. - Highlights: • Distribution of clearnes index is proposed for prediction of global solar radiation. • The clearness index is obtained from the past data of global solar radiation. • The parameters of distribution that best fit the clearness index are determined. • Solar radiation is predicted from the clearness index using inverse transformation. • The method is effective in predicting the monthly average global solar radiation.

  11. Prediction of Chemical Function: Model Development and Application

    Science.gov (United States)

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...

  12. Statistical Tests for Mixed Linear Models

    CERN Document Server

    Khuri, André I; Sinha, Bimal K

    2011-01-01

    An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a

  13. Testicular Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of testicular cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  14. Pancreatic Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing pancreatic cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  15. Colorectal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  16. Prostate Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing prostate cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  17. Bladder Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing bladder cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  18. Esophageal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing esophageal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  19. Cervical Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  20. Breast Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing breast cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  1. Lung Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing lung cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  2. Liver Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing liver cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  3. Ovarian Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing ovarian cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  4. Statistical image processing and multidimensional modeling

    CERN Document Server

    Fieguth, Paul

    2010-01-01

    Images are all around us! The proliferation of low-cost, high-quality imaging devices has led to an explosion in acquired images. When these images are acquired from a microscope, telescope, satellite, or medical imaging device, there is a statistical image processing task: the inference of something - an artery, a road, a DNA marker, an oil spill - from imagery, possibly noisy, blurry, or incomplete. A great many textbooks have been written on image processing. However this book does not so much focus on images, per se, but rather on spatial data sets, with one or more measurements taken over

  5. STATISTICAL MECHANICS MODELING OF MESOSCALE DEFORMATION IN METALS

    Energy Technology Data Exchange (ETDEWEB)

    Anter El-Azab

    2013-04-08

    The research under this project focused on a theoretical and computational modeling of dislocation dynamics of mesoscale deformation of metal single crystals. Specifically, the work aimed to implement a continuum statistical theory of dislocations to understand strain hardening and cell structure formation under monotonic loading. These aspects of crystal deformation are manifestations of the evolution of the underlying dislocation system under mechanical loading. The project had three research tasks: 1) Investigating the statistical characteristics of dislocation systems in deformed crystals. 2) Formulating kinetic equations of dislocations and coupling these kinetics equations and crystal mechanics. 3) Computational solution of coupled crystal mechanics and dislocation kinetics. Comparison of dislocation dynamics predictions with experimental results in the area of statistical properties of dislocations and their field was also a part of the proposed effort. In the first research task, the dislocation dynamics simulation method was used to investigate the spatial, orientation, velocity, and temporal statistics of dynamical dislocation systems, and on the use of the results from this investigation to complete the kinetic description of dislocations. The second task focused on completing the formulation of a kinetic theory of dislocations that respects the discrete nature of crystallographic slip and the physics of dislocation motion and dislocation interaction in the crystal. Part of this effort also targeted the theoretical basis for establishing the connection between discrete and continuum representation of dislocations and the analysis of discrete dislocation simulation results within the continuum framework. This part of the research enables the enrichment of the kinetic description with information representing the discrete dislocation systems behavior. The third task focused on the development of physics-inspired numerical methods of solution of the coupled

  6. Prediction and reconstruction of future and missing unobservable modified Weibull lifetime based on generalized order statistics

    Directory of Open Access Journals (Sweden)

    Amany E. Aly

    2016-04-01

    Full Text Available When a system consisting of independent components of the same type, some appropriate actions may be done as soon as a portion of them have failed. It is, therefore, important to be able to predict later failure times from earlier ones. One of the well-known failure distributions commonly used to model component life, is the modified Weibull distribution (MWD. In this paper, two pivotal quantities are proposed to construct prediction intervals for future unobservable lifetimes based on generalized order statistics (gos from MWD. Moreover, a pivotal quantity is developed to reconstruct missing observations at the beginning of experiment. Furthermore, Monte Carlo simulation studies are conducted and numerical computations are carried out to investigate the efficiency of presented results. Finally, two illustrative examples for real data sets are analyzed.

  7. Statistical modeling and extrapolation of carcinogenesis data

    International Nuclear Information System (INIS)

    Krewski, D.; Murdoch, D.; Dewanji, A.

    1986-01-01

    Mathematical models of carcinogenesis are reviewed, including pharmacokinetic models for metabolic activation of carcinogenic substances. Maximum likelihood procedures for fitting these models to epidemiological data are discussed, including situations where the time to tumor occurrence is unobservable. The plausibility of different possible shapes of the dose response curve at low doses is examined, and a robust method for linear extrapolation to low doses is proposed and applied to epidemiological data on radiation carcinogenesis

  8. Statistical Model Selection for TID Hardness Assurance

    Science.gov (United States)

    Ladbury, R.; Gorelick, J. L.; McClure, S.

    2010-01-01

    Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.

  9. Effective phonocardiogram segmentation using time statistics and nonlinear prediction

    Science.gov (United States)

    Sridharan, Rajeswari; Janet, J.

    2010-02-01

    In the fields of image processing, signal processing and recognition, image Segmentation is an efficient method for segmenting the phonocardiograph signals (PCG) is offered. Primarily, inter-beat segmentation is approved and carried out by means of DII lead of the ECG recording for identifying the happenings of the very first heart sound (S1). Then, the intra-beat segmentation is attained by the use of recurrence time statistics (RTS), and that is very sensitive to variations of the renovated attractor in a state space derived from nonlinear dynamic analysis. Apart from this if the segmentation with RTS is unsuccessful, a special segmentation is proposed using threshold that is extracted from the high frequency rate decomposition and the feature extraction of the disorder is classified based on the murmur sounds. In the Inter-beat segmentation process the accuracy was 100% of the over all PCG recording. Taking into account a different level of PCG beats were strongly concerned by different types of cardiac murmurs and intra-beat segmentation are give up for an accurate result.

  10. comparative analysis of two mathematical models for prediction

    African Journals Online (AJOL)

    Abstract. A mathematical modeling for prediction of compressive strength of sandcrete blocks was performed using statistical analysis for the sandcrete block data ob- tained from experimental work done in this study. The models used are Scheffes and Osadebes optimization theories to predict the compressive strength of ...

  11. Comparative Analysis of Two Mathematical Models for Prediction of ...

    African Journals Online (AJOL)

    A mathematical modeling for prediction of compressive strength of sandcrete blocks was performed using statistical analysis for the sandcrete block data obtained from experimental work done in this study. The models used are Scheffe's and Osadebe's optimization theories to predict the compressive strength of sandcrete ...

  12. Multivariate statistical modelling based on generalized linear models

    CERN Document Server

    Fahrmeir, Ludwig

    1994-01-01

    This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...

  13. Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

    Directory of Open Access Journals (Sweden)

    Simone Fiori

    2007-07-01

    Full Text Available Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure.

  14. A statistical model of Rift Valley fever activity in Egypt.

    Science.gov (United States)

    Drake, John M; Hassan, Ali N; Beier, John C

    2013-12-01

    Rift Valley fever (RVF) is a viral disease of animals and humans and a global public health concern due to its ecological plasticity, adaptivity, and potential for spread to countries with a temperate climate. In many places, outbreaks are episodic and linked to climatic, hydrologic, and socioeconomic factors. Although outbreaks of RVF have occurred in Egypt since 1977, attempts to identify risk factors have been limited. Using a statistical learning approach (lasso-regularized generalized linear model), we tested the hypotheses that outbreaks in Egypt are linked to (1) River Nile conditions that create a mosquito vector habitat, (2) entomologic conditions favorable to transmission, (3) socio-economic factors (Islamic festival of Greater Bairam), and (4) recent history of transmission activity. Evidence was found for effects of rainfall and river discharge and recent history of transmission activity. There was no evidence for an effect of Greater Bairam. The model predicted RVF activity correctly in 351 of 358 months (98.0%). This is the first study to statistically identify risk factors for RVF outbreaks in a region of unstable transmission. © 2013 The Society for Vector Ecology.

  15. A revised prediction model for natural conception.

    Science.gov (United States)

    Bensdorp, Alexandra J; van der Steeg, Jan Willem; Steures, Pieternel; Habbema, J Dik F; Hompes, Peter G A; Bossuyt, Patrick M M; van der Veen, Fulco; Mol, Ben W J; Eijkemans, Marinus J C

    2017-06-01

    One of the aims in reproductive medicine is to differentiate between couples that have favourable chances of conceiving naturally and those that do not. Since the development of the prediction model of Hunault, characteristics of the subfertile population have changed. The objective of this analysis was to assess whether additional predictors can refine the Hunault model and extend its applicability. Consecutive subfertile couples with unexplained and mild male subfertility presenting in fertility clinics were asked to participate in a prospective cohort study. We constructed a multivariable prediction model with the predictors from the Hunault model and new potential predictors. The primary outcome, natural conception leading to an ongoing pregnancy, was observed in 1053 women of the 5184 included couples (20%). All predictors of the Hunault model were selected into the revised model plus an additional seven (woman's body mass index, cycle length, basal FSH levels, tubal status,history of previous pregnancies in the current relationship (ongoing pregnancies after natural conception, fertility treatment or miscarriages), semen volume, and semen morphology. Predictions from the revised model seem to concur better with observed pregnancy rates compared with the Hunault model; c-statistic of 0.71 (95% CI 0.69 to 0.73) compared with 0.59 (95% CI 0.57 to 0.61). Copyright © 2017. Published by Elsevier Ltd.

  16. Statistical Modelling of Extreme Rainfall in Taiwan

    NARCIS (Netherlands)

    L-F. Chu (Lan-Fen); M.J. McAleer (Michael); C-C. Chang (Ching-Chung)

    2012-01-01

    textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.

  17. Statistical Modelling of Extreme Rainfall in Taiwan

    NARCIS (Netherlands)

    L. Chu (LanFen); M.J. McAleer (Michael); C-H. Chang (Chu-Hsiang)

    2013-01-01

    textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.

  18. Statistical prediction of immunity to placental malaria based on multi-assay antibody data for malarial antigens

    DEFF Research Database (Denmark)

    Siriwardhana, Chathura; Fang, Rui; Salanti, Ali

    2017-01-01

    to 28 malarial antigens and used the data to develop statistical models for predicting if a woman has sufficient immunity to prevent PM. Methods Archival plasma samples from 1377 women were screened in a bead-based multiplex assay for Ab to 17 VAR2CSA-associated antigens (full length VAR2CSA (FV2), DBL...... in the following seven statistical approaches: logistic regression full model, logistic regression reduced model, recursive partitioning, random forests, linear discriminant analysis, quadratic discriminant analysis, and support vector machine. Results The best and simplest model proved to be the logistic...

  19. Statistical Models to Assess the Health Effects and to Forecast Ground Level Ozone

    Czech Academy of Sciences Publication Activity Database

    Schlink, U.; Herbath, O.; Richter, M.; Dorling, S.; Nunnari, G.; Cawley, G.; Pelikán, Emil

    2006-01-01

    Roč. 21, č. 4 (2006), s. 547-558 ISSN 1364-8152 R&D Projects: GA AV ČR 1ET400300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : statistical models * ground level ozone * health effects * logistic model * forecasting * prediction performance * neural network * generalised additive model * integrated assessment Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.992, year: 2006

  20. Statistical modelling of traffic safety development

    DEFF Research Database (Denmark)

    Christens, Peter

    2004-01-01

    Road safety is a major concern for society and individuals. Although road safety has improved in recent years, the number of road fatalities is still unacceptably high. In 2000, road accidents killed over 40,000 people in the European Union and injured more than 1.7 million. In 2001 in Denmark...... there were 6861 injury trafficc accidents reported by the police, resulting in 4519 minor injuries, 3946 serious injuries, and 431 fatalities. The general purpose of the research was to improve the insight into aggregated road safety methodology in Denmark. The aim was to analyse advanced statistical methods......, that were designed to study developments over time, including effects of interventions. This aim has been achieved by investigating variations in aggregated Danish traffic accident series and by applying state of the art methodologies to specific case studies. The thesis comprises an introduction...

  1. A measure of statistical complexity based on predictive information with application to finite spin systems

    Energy Technology Data Exchange (ETDEWEB)

    Abdallah, Samer A., E-mail: samer.abdallah@eecs.qmul.ac.uk [School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS (United Kingdom); Plumbley, Mark D., E-mail: mark.plumbley@eecs.qmul.ac.uk [School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS (United Kingdom)

    2012-01-09

    We propose the binding information as an information theoretic measure of complexity between multiple random variables, such as those found in the Ising or Potts models of interacting spins, and compare it with several previously proposed measures of statistical complexity, including excess entropy, Bialek et al.'s predictive information, and the multi-information. We discuss and prove some of the properties of binding information, particularly in relation to multi-information and entropy, and show that, in the case of binary random variables, the processes which maximise binding information are the ‘parity’ processes. The computation of binding information is demonstrated on Ising models of finite spin systems, showing that various upper and lower bounds are respected and also that there is a strong relationship between the introduction of high-order interactions and an increase of binding-information. Finally we discuss some of the implications this has for the use of the binding information as a measure of complexity. -- Highlights: ► We introduce ‘binding information’ as a entropic/statistical measure of complexity. ► Binding information (BI) is related to earlier notions of predictive information. ► We derive upper and lower bounds of BI relation to entropy and multi-information. ► Parity processes found to maximise BI in finite sets of binary random variables. ► Application to spin glasses shows highest BI obtained with high-order interactions.

  2. Statistical Downscaling of Seasonal Prediction using Weather Generator with Application to a Korea Basin

    Science.gov (United States)

    Kim, M.

    2016-12-01

    GCM (General Circulation Model) is a basic and fundamental tool for predicting future climate conditions. Unfortunately, its output is too coarse in space and time to be directly applied to application fields. Currently, there is a large amount of research focused on closing the gap between resolutions of GCM output and application data. This process is called downscaling, for which many methods have been proposed in dynamical and statistical contexts. Statistical downscaling methods are frequently employed in hydrological and agricultural studies since it can rapidly downscale GCM output without expensive computational costs. Among many statistical downscaling methods, weather generator is an attractive one producing climate data of daily time scale for a local region. However, most of weather generators are originally designed to simulate local weather based on climatology during observation period. Especially, inter-annual variability of climate is not taken into account. In this study, we develop a new weather generator linked with large-scale climate in order to reflect seasonal prediction into weather simulation. The basic idea is to parametrize local climate characteristics into the underlying weather generator model, then, to link it with large-scale climatic variables. It indicates that the values of parameters, representing the condition of local climate system, are varying according to that of large-scale climate. We illustrate it by an application to a Korea basin. Local climate characteristics under consideration are monthly mean of daily maximum/minimum temperatures adjusted by precipitation effect, mean of dry-spell length, and precipitation intensity. The link between local and large-scale climate is quantified by regression model.

  3. A Noise Robust Statistical Texture Model

    DEFF Research Database (Denmark)

    Hilger, Klaus Baggesen; Stegmann, Mikkel Bille; Larsen, Rasmus

    2002-01-01

    This paper presents a novel approach to the problem of obtaining a low dimensional representation of texture (pixel intensity) variation present in a training set after alignment using a Generalised Procrustes analysis.We extend the conventional analysis of training textures in the Active...... Appearance Models segmentation framework. This is accomplished by augmenting the model with an estimate of the covariance of the noise present in the training data. This results in a more compact model maximising the signal-to-noise ratio, thus favouring subspaces rich on signal, but low on noise....... Differences in the methods are illustrated on a set of left cardiac ventricles obtained using magnetic resonance imaging....

  4. What do saliency models predict?

    Science.gov (United States)

    Koehler, Kathryn; Guo, Fei; Zhang, Sheng; Eckstein, Miguel P.

    2014-01-01

    Saliency models have been frequently used to predict eye movements made during image viewing without a specified task (free viewing). Use of a single image set to systematically compare free viewing to other tasks has never been performed. We investigated the effect of task differences on the ability of three models of saliency to predict the performance of humans viewing a novel database of 800 natural images. We introduced a novel task where 100 observers made explicit perceptual judgments about the most salient image region. Other groups of observers performed a free viewing task, saliency search task, or cued object search task. Behavior on the popular free viewing task was not best predicted by standard saliency models. Instead, the models most accurately predicted the explicit saliency selections and eye movements made while performing saliency judgments. Observers' fixations varied similarly across images for the saliency and free viewing tasks, suggesting that these two tasks are related. The variability of observers' eye movements was modulated by the task (lowest for the object search task and greatest for the free viewing and saliency search tasks) as well as the clutter content of the images. Eye movement variability in saliency search and free viewing might be also limited by inherent variation of what observers consider salient. Our results contribute to understanding the tasks and behavioral measures for which saliency models are best suited as predictors of human behavior, the relationship across various perceptual tasks, and the factors contributing to observer variability in fixational eye movements. PMID:24618107

  5. Statistical models for nuclear decay from evaporation to vaporization

    CERN Document Server

    Cole, A J

    2000-01-01

    Elements of equilibrium statistical mechanics: Introduction. Microstates and macrostates. Sub-systems and convolution. The Boltzmann distribution. Statistical mechanics and thermodynamics. The grand canonical ensemble. Equations of state for ideal and real gases. Pseudo-equilibrium. Statistical models of nuclear decay. Nuclear physics background: Introduction. Elements of the theory of nuclear reactions. Quantum mechanical description of scattering from a potential. Decay rates and widths. Level and state densities in atomic nuclei. Angular momentum in quantum mechanics. History of statistical

  6. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  7. Canonical Statistical Model for Maximum Expected Immission of Wire Conductor in an Aperture Enclosure

    Science.gov (United States)

    Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.

    2016-01-01

    Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.

  8. Physics-based statistical model and simulation method of RF propagation in urban environments

    Science.gov (United States)

    Pao, Hsueh-Yuan; Dvorak, Steven L.

    2010-09-14

    A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.

  9. Shape-correlated deformation statistics for respiratory motion prediction in 4D lung

    Science.gov (United States)

    Liu, Xiaoxiao; Oguz, Ipek; Pizer, Stephen M.; Mageras, Gig S.

    2010-02-01

    4D image-guided radiation therapy (IGRT) for free-breathing lungs is challenging due to the complicated respiratory dynamics. Effective modeling of respiratory motion is crucial to account for the motion affects on the dose to tumors. We propose a shape-correlated statistical model on dense image deformations for patient-specic respiratory motion estimation in 4D lung IGRT. Using the shape deformations of the high-contrast lungs as the surrogate, the statistical model trained from the planning CTs can be used to predict the image deformation during delivery verication time, with the assumption that the respiratory motion at both times are similar for the same patient. Dense image deformation fields obtained by diffeomorphic image registrations characterize the respiratory motion within one breathing cycle. A point-based particle optimization algorithm is used to obtain the shape models of lungs with group-wise surface correspondences. Canonical correlation analysis (CCA) is adopted in training to maximize the linear correlation between the shape variations of the lungs and the corresponding dense image deformations. Both intra- and inter-session CT studies are carried out on a small group of lung cancer patients and evaluated in terms of the tumor location accuracies. The results suggest potential applications using the proposed method.

  10. A Statistical Model for Energy Intensity

    Directory of Open Access Journals (Sweden)

    Marjaneh Issapour

    2012-12-01

    Full Text Available A promising approach to improve scientific literacy in regards to global warming and climate change is using a simulation as part of a science education course. The simulation needs to employ scientific analysis of actual data from internationally accepted and reputable databases to demonstrate the reality of the current climate change situation. One of the most important criteria for using a simulation in a science education course is the fidelity of the model. The realism of the events and consequences modeled in the simulation is significant as well. Therefore, all underlying equations and algorithms used in the simulation must have real-world scientific basis. The "Energy Choices" simulation is one such simulation. The focus of this paper is the development of a mathematical model for "Energy Intensity" as a part of the overall system dynamics in "Energy Choices" simulation. This model will define the "Energy Intensity" as a function of other independent variables that can be manipulated by users of the simulation. The relationship discovered by this research will be applied to an algorithm in the "Energy Choices" simulation.

  11. Latent domain models for statistical machine translation

    NARCIS (Netherlands)

    Hoàng, C.

    2017-01-01

    A data-driven approach to model translation suffers from the data mismatch problem and demands domain adaptation techniques. Given parallel training data originating from a specific domain, training an MT system on the data would result in a rather suboptimal translation for other domains. But does

  12. Behavioral and statistical models of educational inequality

    DEFF Research Database (Denmark)

    Holm, Anders; Breen, Richard

    2016-01-01

    This paper addresses the question of how students and their families make educational decisions. We describe three types of behavioral model that might underlie decision-making and we show that they have consequences for what decisions are made. Our study thus has policy implications if we wish...

  13. Statistical model semiquantitatively approximates arabinoxylooligosaccharides' structural diversity

    DEFF Research Database (Denmark)

    Dotsenko, Gleb; Nielsen, Michael Krogsgaard; Lange, Lene

    2016-01-01

    (wheat flour arabinoxylan (arabinose/xylose, A/X = 0.47); grass arabinoxylan (A/X = 0.24); wheat straw arabinoxylan (A/X = 0.15); and hydrothermally pretreated wheat straw arabinoxylan (A/X = 0.05)), is semiquantitatively approximated using the proposed model. The suggested approach can be applied...

  14. A STATISTICAL MODEL FOR STOCK ASSESSMENT OF ...

    African Journals Online (AJOL)

    Assessment of the status of southern bluefin tuna (SBT) by Australia and Japan has used a method (ADAPT) that imposes a number of structural restrictions, and is ... over time within the bounds of specific structure, and (3) autocorrelation in recruitment processes is considered within the likelihood framework of the model.

  15. Validation of the measure automobile emissions model : a statistical analysis

    Science.gov (United States)

    2000-09-01

    The Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model provides an external validation capability for hot stabilized option; the model is one of several new modal emissions models designed to predict hot stabilized e...

  16. A generic statistical methodology to predict the maximum pit depth of a localized corrosion process

    International Nuclear Information System (INIS)

    Jarrah, A.; Bigerelle, M.; Guillemot, G.; Najjar, D.; Iost, A.; Nianga, J.-M.

    2011-01-01

    Highlights: → We propose a methodology to predict the maximum pit depth in a corrosion process. → Generalized Lambda Distribution and the Computer Based Bootstrap Method are combined. → GLD fit a large variety of distributions both in their central and tail regions. → Minimum thickness preventing perforation can be estimated with a safety margin. → Considering its applications, this new approach can help to size industrial pieces. - Abstract: This paper outlines a new methodology to predict accurately the maximum pit depth related to a localized corrosion process. It combines two statistical methods: the Generalized Lambda Distribution (GLD), to determine a model of distribution fitting with the experimental frequency distribution of depths, and the Computer Based Bootstrap Method (CBBM), to generate simulated distributions equivalent to the experimental one. In comparison with conventionally established statistical methods that are restricted to the use of inferred distributions constrained by specific mathematical assumptions, the major advantage of the methodology presented in this paper is that both the GLD and the CBBM enable a statistical treatment of the experimental data without making any preconceived choice neither on the unknown theoretical parent underlying distribution of pit depth which characterizes the global corrosion phenomenon nor on the unknown associated theoretical extreme value distribution which characterizes the deepest pits. Considering an experimental distribution of depths of pits produced on an aluminium sample, estimations of maximum pit depth using a GLD model are compared to similar estimations based on usual Gumbel and Generalized Extreme Value (GEV) methods proposed in the corrosion engineering literature. The GLD approach is shown having smaller bias and dispersion in the estimation of the maximum pit depth than the Gumbel approach both for its realization and mean. This leads to comparing the GLD approach to the GEV one

  17. Relative effects of statistical preprocessing and postprocessing on a regional hydrological ensemble prediction system

    Science.gov (United States)

    Sharma, Sanjib; Siddique, Ridwan; Reed, Seann; Ahnert, Peter; Mendoza, Pablo; Mejia, Alfonso

    2018-03-01

    The relative roles of statistical weather preprocessing and streamflow postprocessing in hydrological ensemble forecasting at short- to medium-range forecast lead times (day 1-7) are investigated. For this purpose, a regional hydrologic ensemble prediction system (RHEPS) is developed and implemented. The RHEPS is comprised of the following components: (i) hydrometeorological observations (multisensor precipitation estimates, gridded surface temperature, and gauged streamflow); (ii) weather ensemble forecasts (precipitation and near-surface temperature) from the National Centers for Environmental Prediction 11-member Global Ensemble Forecast System Reforecast version 2 (GEFSRv2); (iii) NOAA's Hydrology Laboratory-Research Distributed Hydrologic Model (HL-RDHM); (iv) heteroscedastic censored logistic regression (HCLR) as the statistical preprocessor; (v) two statistical postprocessors, an autoregressive model with a single exogenous variable (ARX(1,1)) and quantile regression (QR); and (vi) a comprehensive verification strategy. To implement the RHEPS, 1 to 7 days weather forecasts from the GEFSRv2 are used to force HL-RDHM and generate raw ensemble streamflow forecasts. Forecasting experiments are conducted in four nested basins in the US Middle Atlantic region, ranging in size from 381 to 12 362 km2. Results show that the HCLR preprocessed ensemble precipitation forecasts have greater skill than the raw forecasts. These improvements are more noticeable in the warm season at the longer lead times (> 3 days). Both postprocessors, ARX(1,1) and QR, show gains in skill relative to the raw ensemble streamflow forecasts, particularly in the cool season, but QR outperforms ARX(1,1). The scenarios that implement preprocessing and postprocessing separately tend to perform similarly, although the postprocessing-alone scenario is often more effective. The scenario involving both preprocessing and postprocessing consistently outperforms the other scenarios. In some cases

  18. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  19. Statistical validation of event predictors: A comparative study based on the field of seizure prediction

    International Nuclear Information System (INIS)

    Feldwisch-Drentrup, Hinnerk; Schulze-Bonhage, Andreas; Timmer, Jens; Schelter, Bjoern

    2011-01-01

    The prediction of events is of substantial interest in many research areas. To evaluate the performance of prediction methods, the statistical validation of these methods is of utmost importance. Here, we compare an analytical validation method to numerical approaches that are based on Monte Carlo simulations. The comparison is performed in the field of the prediction of epileptic seizures. In contrast to the analytical validation method, we found that for numerical validation methods insufficient but realistic sample sizes can lead to invalid high rates of false positive conclusions. Hence we outline necessary preconditions for sound statistical tests on above chance predictions.

  20. Advanced data analysis in neuroscience integrating statistical and computational models

    CERN Document Server

    Durstewitz, Daniel

    2017-01-01

    This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering.  Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...

  1. Statistics Based Models for the Dynamics of Chernivtsi Children Disease

    Directory of Open Access Journals (Sweden)

    Igor G. Nesteruk

    2017-10-01

    Full Text Available Background. Simple mathematical models of contamination and SIR-model of spreading an infection were used to simulate the time dynamics of the unknown before children disease, which occurred in Chernivtsi (Ukraine. The cause of many cases of alopecia, which began in this city in August 1988 is still not fully clarified. According to the official report of the governmental commission, the last new cases occurred in the middle of November 1988, and the reason of the illness was reported as chemical exogenous intoxication. Later this illness became the name “Chernivtsi chemical disease”. Nevertheless, the significantly increased number of new cases of the local alopecia was registered almost three years and is still not clarified. Objective. The comparison of two different versions of the disease: chemical exogenous intoxication and infection. Identification of the parameters of mathematical models and prediction of the disease development. Methods. Analytical solutions of the contamination models and SIR-model for an epidemic are obtained. The optimal values of parameters with the use of linear regression were found. Results. The optimal values of the models parameters with the use of statistical approach were identified. The calculations showed that the infectious version of the disease is more reliable in comparison with the popular contamination one. The possible date of the epidemic beginning was estimated. Conclusions. The optimal parameters of SIR-model allow calculating the realistic number of victims and other characteristics of possible epidemic. They also show that increased number of cases of local alopecia could be a part of the same epidemic as “Chernivtsi chemical disease”.

  2. Domain analysis and modeling to improve comparability of health statistics.

    Science.gov (United States)

    Okada, M; Hashimoto, H; Ohida, T

    2001-01-01

    Health statistics is an essential element to improve the ability of managers of health institutions, healthcare researchers, policy makers, and health professionals to formulate appropriate course of reactions and to make decisions based on evidence. To ensure adequate health statistics, standards are of critical importance. A study on healthcare statistics domain analysis is underway in an effort to improve usability and comparability of health statistics. The ongoing study focuses on structuring the domain knowledge and making the knowledge explicit with a data element dictionary being the core. Supplemental to the dictionary are a domain term list, a terminology dictionary, and a data model to help organize the concepts constituting the health statistics domain.

  3. Predicting losing and gaining river reaches in lowland New Zealand based on a statistical methodology

    Science.gov (United States)

    Yang, Jing; Zammit, Christian; Dudley, Bruce

    2017-04-01

    The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.

  4. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  5. The statistical multifragmentation model: Origins and recent advances

    Energy Technology Data Exchange (ETDEWEB)

    Donangelo, R., E-mail: donangel@fing.edu.uy [Instituto de Física, Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, 11300, Montevideo (Uruguay); Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Souza, S. R., E-mail: srsouza@if.ufrj.br [Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Instituto de Física, Universidade Federal do Rio Grande do Sul, C.P. 15051, 91501-970 Porto Alegre - RS (Brazil)

    2016-07-07

    We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.

  6. Spectral statistics in particles-rotor model and cranking model

    CERN Document Server

    Zhou Xian Rong; Zhao En Guang; Guo Lu

    2002-01-01

    Spectral statistics for six particles in single-j and two-j model coupled with a deformed core are studied in the frames of particles-rotor model and cranking shell model. The nearest-neighbor-distribution of energy levels and spectral rigidity are studied as a function of the spin or cranking frequency, respectively. The results of single-j shell are compared with those in two-j case. The system becomes more regular when single-j space (i sub 1 sub 3 sub / sub 2) is replaced by two-j shell (g sub 7 sub / sub 2 + d sub 5 sub / sub 2), although the basis size of the configuration space is unchanged. However, the degree of chaoticity of the system changes slightly when configuration space is enlarged by extending single-j shell (i sub 1 sub 3 sub / sub 2) to two-j shell (i sub 1 sub 3 sub / sub 2 + g sub 9 sub / sub 2). Nuclear chaotic behavior is studied when authors take a two-body interaction as delta force and pairing interaction, respectively

  7. Saccadic gain adaptation is predicted by the statistics of natural fluctuations in oculomotor function

    Directory of Open Access Journals (Sweden)

    Mark V Albert

    2012-12-01

    Full Text Available Due to multiple factors such as fatigue, muscle strengthening, and neural plasticity, the responsiveness of the motor apparatus to neural commands changes over time. To enable precise movements the nervous system must adapt to compensate for these changes. Recent models of motor adaptation derive from assumptions about the way the motor apparatus changes. Characterizing these changes is difficult because motor adaptation happens at the same time, masking most of the effects of ongoing changes. Here, we analyze eye movements of monkeys with lesions to the posterior cerebellar vermis that impair adaptation. Their fluctuations better reveal the underlying changes of the motor system over time. When these measured, unadapted changes are used to derive optimal motor adaptation rules the prediction precision significantly improves. Among three models that similarly fit single-day adaptation results, the model that also matches the temporal correlations of the nonadapting saccades most accurately predicts multiple day adaptation. Saccadic gain adaptation is well matched to the natural statistics of fluctuations of the oculomotor plant.

  8. High resolution statistical downscaling of the EUROSIP seasonal prediction. Application for southeastern Romania

    Science.gov (United States)

    Busuioc, Aristita; Dumitrescu, Alexandru; Dumitrache, Rodica; Iriza, Amalia

    2017-04-01

    Seasonal climate forecasts in Europe are currently issued at the European Centre for Medium-Range Weather Forecasts (ECMWF) in the form of multi-model ensemble predictions available within the "EUROSIP" system. Different statistical techniques to calibrate, downscale and combine the EUROSIP direct model output are used to optimize the quality of the final probabilistic forecasts. In this study, a statistical downscaling model (SDM) based on canonical correlation analysis (CCA) is used to downscale the EUROSIP seasonal forecast at a spatial resolution of 1km x 1km over the Movila farm placed in southeastern Romania. This application is achieved in the framework of the H2020 MOSES project (http://www.moses-project.eu). The combination between monthly standardized values of three climate variables (maximum/minimum temperatures-Tmax/Tmin, total precipitation-Prec) is used as predictand while combinations of various large-scale predictors are tested in terms of their availability as outputs in the seasonal EUROSIP probabilistic forecasting (sea level pressure, temperature at 850 hPa and geopotential height at 500 hPa). The predictors are taken from the ECMWF system considering 15 members of the ensemble, for which the hindcasts since 1991 until present are available. The model was calibrated over the period 1991-2014 and predictions for summers 2015 and 2016 were achieved. The calibration was made for the ensemble average as well as for each ensemble member. The model was developed for each lead time: one month anticipation for June, two months anticipation for July and three months anticipation for August. The main conclusions from these preliminary results are: best predictions (in terms of the anomaly sign) for Tmax (July-2 months anticipation, August-3 months anticipation) for both years (2015, 2016); for Tmin - good predictions only for August (3 months anticipation ) for both years; for precipitation, good predictions for July (2 months anticipation) in 2015 and

  9. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    Science.gov (United States)

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  10. Statistical modelling in biostatistics and bioinformatics selected papers

    CERN Document Server

    Peng, Defen

    2014-01-01

    This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...

  11. Functional summary statistics for the Johnson-Mehl model

    DEFF Research Database (Denmark)

    Møller, Jesper; Ghorbani, Mohammad

    The Johnson-Mehl germination-growth model is a spatio-temporal point process model which among other things have been used for the description of neurotransmitters datasets. However, for such datasets parametric Johnson-Mehl models fitted by maximum likelihood have yet not been evaluated by means...... of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....

  12. Seasonal predictability of Kiremt rainfall in coupled general circulation models

    Science.gov (United States)

    Gleixner, Stephanie; Keenlyside, Noel S.; Demissie, Teferi D.; Counillon, François; Wang, Yiguo; Viste, Ellen

    2017-11-01

    The Ethiopian economy and population is strongly dependent on rainfall. Operational seasonal predictions for the main rainy season (Kiremt, June-September) are based on statistical approaches with Pacific sea surface temperatures (SST) as the main predictor. Here we analyse dynamical predictions from 11 coupled general circulation models for the Kiremt seasons from 1985-2005 with the forecasts starting from the beginning of May. We find skillful predictions from three of the 11 models, but no model beats a simple linear prediction model based on the predicted Niño3.4 indices. The skill of the individual models for dynamically predicting Kiremt rainfall depends on the strength of the teleconnection between Kiremt rainfall and concurrent Pacific SST in the models. Models that do not simulate this teleconnection fail to capture the observed relationship between Kiremt rainfall and the large-scale Walker circulation.

  13. Fitting statistical models in bivariate allometry.

    Science.gov (United States)

    Packard, Gary C; Birchard, Geoffrey F; Boardman, Thomas J

    2011-08-01

    Several attempts have been made in recent years to formulate a general explanation for what appear to be recurring patterns of allometric variation in morphology, physiology, and ecology of both plants and animals (e.g. the Metabolic Theory of Ecology, the Allometric Cascade, the Metabolic-Level Boundaries hypothesis). However, published estimates for parameters in allometric equations often are inaccurate, owing to undetected bias introduced by the traditional method for fitting lines to empirical data. The traditional method entails fitting a straight line to logarithmic transformations of the original data and then back-transforming the resulting equation to the arithmetic scale. Because of fundamental changes in distributions attending transformation of predictor and response variables, the traditional practice may cause influential outliers to go undetected, and it may result in an underparameterized model being fitted to the data. Also, substantial bias may be introduced by the insidious rotational distortion that accompanies regression analyses performed on logarithms. Consequently, the aforementioned patterns of allometric variation may be illusions, and the theoretical explanations may be wide of the mark. Problems attending the traditional procedure can be largely avoided in future research simply by performing preliminary analyses on arithmetic values and by validating fitted equations in the arithmetic domain. The goal of most allometric research is to characterize relationships between biological variables and body size, and this is done most effectively with data expressed in the units of measurement. Back-transforming from a straight line fitted to logarithms is not a generally reliable way to estimate an allometric equation in the original scale. © 2010 The Authors. Biological Reviews © 2010 Cambridge Philosophical Society.

  14. Statistical methodology for predicting the life of lithium-ion cells via accelerated degradation testing

    Science.gov (United States)

    Thomas, E. V.; Bloom, I.; Christophersen, J. P.; Battaglia, V. S.

    Statistical models based on data from accelerated aging experiments are used to predict cell life. In this article, we discuss a methodology for estimating the mean cell life with uncertainty bounds that uses both a degradation model (reflecting average cell performance) and an error model (reflecting the measured cell-to-cell variability in performance). Specific forms for the degradation and error models are presented and illustrated with experimental data that were acquired from calendar-life testing of high-power lithium-ion cells as part of the U.S. Department of Energy's (DOEs) Advanced Technology Development program. Monte Carlo simulations, based on the developed models, are used to assess lack-of-fit and develop uncertainty limits for the average cell life. In addition, we discuss the issue of assessing the applicability of degradation models (based on data acquired from cells aged under static conditions) to the degradation of cells aged under more realistic dynamic conditions (e.g., varying temperature).

  15. A Statistical Evaluation of Atmosphere-Ocean General Circulation Models: Complexity vs. Simplicity

    OpenAIRE

    Robert K. Kaufmann; David I. Stern

    2004-01-01

    The principal tools used to model future climate change are General Circulation Models which are deterministic high resolution bottom-up models of the global atmosphere-ocean system that require large amounts of supercomputer time to generate results. But are these models a cost-effective way of predicting future climate change at the global level? In this paper we use modern econometric techniques to evaluate the statistical adequacy of three general circulation models (GCMs) by testing thre...

  16. Probabilistic statistical modeling of air pollution from vehicles

    Science.gov (United States)

    Adikanova, Saltanat; Malgazhdarov, Yerzhan A.; Madiyarov, Muratkan N.; Temirbekov, Nurlan M.

    2017-09-01

    The aim of the work is to create a probabilistic-statistical mathematical model for the distribution of emissions from vehicles. In this article, it is proposed to use the probabilistic and statistical approach for modeling the distribution of harmful impurities in the atmosphere from vehicles using the example of the Ust-Kamenogorsk city. Using a simplified methodology of stochastic modeling, it is possible to construct effective numerical computational algorithms that significantly reduce the amount of computation without losing their accuracy.

  17. Caries risk assessment models in caries prediction

    Directory of Open Access Journals (Sweden)

    Amila Zukanović

    2013-11-01

    Full Text Available Objective. The aim of this research was to assess the efficiency of different multifactor models in caries prediction. Material and methods. Data from the questionnaire and objective examination of 109 examinees was entered into the Cariogram, Previser and Caries-Risk Assessment Tool (CAT multifactor risk assessment models. Caries risk was assessed with the help of all three models for each patient, classifying them as low, medium or high-risk patients. The development of new caries lesions over a period of three years [Decay Missing Filled Tooth (DMFT increment = difference between Decay Missing Filled Tooth Surface (DMFTS index at baseline and follow up], provided for examination of the predictive capacity concerning different multifactor models. Results. The data gathered showed that different multifactor risk assessment models give significantly different results (Friedman test: Chi square = 100.073, p=0.000. Cariogram is the model which identified the majority of examinees as medium risk patients (70%. The other two models were more radical in risk assessment, giving more unfavorable risk –profiles for patients. In only 12% of the patients did the three multifactor models assess the risk in the same way. Previser and CAT gave the same results in 63% of cases – the Wilcoxon test showed that there is no statistically significant difference in caries risk assessment between these two models (Z = -1.805, p=0.071. Conclusions. Evaluation of three different multifactor caries risk assessment models (Cariogram, PreViser and CAT showed that only the Cariogram can successfully predict new caries development in 12-year-old Bosnian children.

  18. Statistics

    International Nuclear Information System (INIS)

    2005-01-01

    For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees

  19. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  20. Statistics

    International Nuclear Information System (INIS)

    1999-01-01

    For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  1. Statistics

    International Nuclear Information System (INIS)

    2001-01-01

    For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  2. Kolmogorov complexity, pseudorandom generators and statistical models testing

    Czech Academy of Sciences Publication Activity Database

    Šindelář, Jan; Boček, Pavel

    2002-01-01

    Roč. 38, č. 6 (2002), s. 747-759 ISSN 0023-5954 R&D Projects: GA ČR GA102/99/1564 Institutional research plan: CEZ:AV0Z1075907 Keywords : Kolmogorov complexity * pseudorandom generators * statistical models testing Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.341, year: 2002

  3. Role of scaling in the statistical modelling of finance

    Indian Academy of Sciences (India)

    Economics and mathematical finance are multidisciplinary fields in which the ten- dency of statistical physicists to focus on universal laws has been criticized some- ..... is coherent and catches the essential statistical features of a long index history. A very important test for the proposed model concerns the scaling of the ...

  4. Flashover of a vacuum-insulator interface: A statistical model

    Directory of Open Access Journals (Sweden)

    W. A. Stygar

    2004-07-01

    Full Text Available We have developed a statistical model for the flashover of a 45° vacuum-insulator interface (such as would be found in an accelerator subject to a pulsed electric field. The model assumes that the initiation of a flashover plasma is a stochastic process, that the characteristic statistical component of the flashover delay time is much greater than the plasma formative time, and that the average rate at which flashovers occur is a power-law function of the instantaneous value of the electric field. Under these conditions, we find that the flashover probability is given by 1-exp(-E_{p}^{β}t_{eff}C/k^{β}, where E_{p} is the peak value in time of the spatially averaged electric field E(t, t_{eff}≡∫[E(t/E_{p}]^{β}dt is the effective pulse width, C is the insulator circumference, k∝exp(λ/d, and β and λ are constants. We define E(t as V(t/d, where V(t is the voltage across the insulator and d is the insulator thickness. Since the model assumes that flashovers occur at random azimuthal locations along the insulator, it does not apply to systems that have a significant defect, i.e., a location contaminated with debris or compromised by an imperfection at which flashovers repeatedly take place, and which prevents a random spatial distribution. The model is consistent with flashover measurements to within 7% for pulse widths between 0.5 ns and 10   μs, and to within a factor of 2 between 0.5 ns and 90 s (a span of over 11 orders of magnitude. For these measurements, E_{p} ranges from 64 to 651  kV/cm, d from 0.50 to 4.32 cm, and C from 4.96 to 95.74 cm. The model is significantly more accurate, and is valid over a wider range of parameters, than the J. C. Martin flashover relation that has been in use since 1971 [J. C. Martin on Pulsed Power, edited by T. H. Martin, A. H. Guenther, and M. Kristiansen (Plenum, New York, 1996]. We have generalized the statistical model to estimate the total-flashover probability of an

  5. Accurate and robust genomic prediction of celiac disease using statistical learning.

    Directory of Open Access Journals (Sweden)

    Gad Abraham

    2014-02-01

    Full Text Available Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS method. Celiac disease (CD, a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87-0.89 and in independent replication across cohorts (AUC of 0.86-0.9, despite differences in ethnicity. The models explained 30-35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.

  6. Improving statistical reasoning theoretical models and practical implications

    CERN Document Server

    Sedlmeier, Peter

    1999-01-01

    This book focuses on how statistical reasoning works and on training programs that can exploit people''s natural cognitive capabilities to improve their statistical reasoning. Training programs that take into account findings from evolutionary psychology and instructional theory are shown to have substantially larger effects that are more stable over time than previous training regimens. The theoretical implications are traced in a neural network model of human performance on statistical reasoning problems. This book apppeals to judgment and decision making researchers and other cognitive scientists, as well as to teachers of statistics and probabilistic reasoning.

  7. Schedulability of Herschel revisited using statistical model checking

    DEFF Research Database (Denmark)

    David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel

    2015-01-01

    to obtain some guarantee on the (un)schedulability of the model even in the presence of undecidability. Two methods are considered: symbolic model checking and statistical model checking. Since the model uses stop-watches, the reachability problem becomes undecidable so we are using an over......-approximation technique. We can safely conclude that the system is schedulable for varying values of BCET. For the cases where deadlines are violated, we use polyhedra to try to confirm the witnesses. Our alternative method to confirm non-schedulability uses statistical model-checking (SMC) to generate counter...

  8. Some remarks on the statistical model of heavy ion collisions

    International Nuclear Information System (INIS)

    Koch, V.

    2003-01-01

    This contribution is an attempt to assess what can be learned from the remarkable success of this statistical model in describing ratios of particle abundances in ultra-relativistic heavy ion collisions

  9. Statistics

    International Nuclear Information System (INIS)

    2003-01-01

    For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products

  10. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  11. Statistics

    International Nuclear Information System (INIS)

    2004-01-01

    For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees

  12. Possibilities of the Statistical Scoring Models' Application at Lithuanian Banks

    OpenAIRE

    Dzidzevičiūtė, Laima

    2013-01-01

    The goal of this dissertation is to develop the rating system of Lithuanian companies based on the statistical scoring model and assess the possibilities of this system‘s application at Lithuanian banks. The dissertation consists of three Chapters. Development and application peculiarities of rating systems based on statistical scoring models are described in the first Chapter. In the second Chapter the results of the survey of commercial banks and foreign bank branches, operating in the coun...

  13. A no extensive statistical model for the nucleon structure function

    Energy Technology Data Exchange (ETDEWEB)

    Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [Instituto de Ciencia, Engenharia e Tecnologia - ICET, Universidade Federal dos Vales do Jequitinhonha e Mucuri - UFVJM, Campus do Mucuri, Rua do Cruzeiro 01, Jardim Sao Paulo, 39803-371, Teofilo Otoni, Minas Gerais (Brazil)

    2013-03-25

    We studied an application of nonextensive thermodynamics to describe the structure function of nucleon, in a model where the usual Fermi-Dirac and Bose-Einstein energy distribution were replaced by the equivalent functions of the q-statistical. The parameters of the model are given by an effective temperature T, the q parameter (from Tsallis statistics), and two chemical potentials given by the corresponding up (u) and down (d) quark normalization in the nucleon.

  14. Improved analyses using function datasets and statistical modeling

    Science.gov (United States)

    John S. Hogland; Nathaniel M. Anderson

    2014-01-01

    Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...

  15. Tornadoes and related damage costs: statistical modeling with a semi-Markov approach

    OpenAIRE

    Corini, Chiara; D'Amico, Guglielmo; Petroni, Filippo; Prattico, Flavio; Manca, Raimondo

    2015-01-01

    We propose a statistical approach to tornadoes modeling for predicting and simulating occurrences of tornadoes and accumulated cost distributions over a time interval. This is achieved by modeling the tornadoes intensity, measured with the Fujita scale, as a stochastic process. Since the Fujita scale divides tornadoes intensity into six states, it is possible to model the tornadoes intensity by using Markov and semi-Markov models. We demonstrate that the semi-Markov approach is able to reprod...

  16. Foreign exchange market data analysis reveals statistical features that predict price movement acceleration.

    Science.gov (United States)

    Nacher, Jose C; Ochiai, Tomoshiro

    2012-05-01

    Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.

  17. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes.

    Science.gov (United States)

    Thiessen, Erik D

    2017-01-05

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik

  18. Escherichia coli bacteria density in relation to turbidity, streamflow characteristics, and season in the Chattahoochee River near Atlanta, Georgia, October 2000 through September 2008—Description, statistical analysis, and predictive modeling

    Science.gov (United States)

    Lawrence, Stephen J.

    2012-01-01

    Water-based recreation—such as rafting, canoeing, and fishing—is popular among visitors to the Chattahoochee River National Recreation Area (CRNRA) in north Georgia. The CRNRA is a 48-mile reach of the Chattahoochee River upstream from Atlanta, Georgia, managed by the National Park Service (NPS). Historically, high densities of fecal-indicator bacteria have been documented in the Chattahoochee River and its tributaries at levels that commonly exceeded Georgia water-quality standards. In October 2000, the NPS partnered with the U.S. Geological Survey (USGS), State and local agencies, and non-governmental organizations to monitor Escherichia coli bacteria (E. coli) density and develop a system to alert river users when E. coli densities exceeded the U.S. Environmental Protection Agency (USEPA) single-sample beach criterion of 235 colonies (most probable number) per 100 milliliters (MPN/100 mL) of water. This program, called BacteriALERT, monitors E. coli density, turbidity, and water temperature at two sites on the Chattahoochee River upstream from Atlanta, Georgia. This report summarizes E. coli bacteria density and turbidity values in water samples collected between 2000 and 2008 as part of the BacteriALERT program; describes the relations between E. coli density and turbidity, streamflow characteristics, and season; and describes the regression analyses used to develop predictive models that estimate E. coli density in real time at both sampling sites.

  19. Pseudo-dynamic source modelling with 1-point and 2-point statistics of earthquake source parameters

    KAUST Repository

    Song, S. G.

    2013-12-24

    Ground motion prediction is an essential element in seismic hazard and risk analysis. Empirical ground motion prediction approaches have been widely used in the community, but efficient simulation-based ground motion prediction methods are needed to complement empirical approaches, especially in the regions with limited data constraints. Recently, dynamic rupture modelling has been successfully adopted in physics-based source and ground motion modelling, but it is still computationally demanding and many input parameters are not well constrained by observational data. Pseudo-dynamic source modelling keeps the form of kinematic modelling with its computational efficiency, but also tries to emulate the physics of source process. In this paper, we develop a statistical framework that governs the finite-fault rupture process with 1-point and 2-point statistics of source parameters in order to quantify the variability of finite source models for future scenario events. We test this method by extracting 1-point and 2-point statistics from dynamically derived source models and simulating a number of rupture scenarios, given target 1-point and 2-point statistics. We propose a new rupture model generator for stochastic source modelling with the covariance matrix constructed from target 2-point statistics, that is, auto- and cross-correlations. Our sensitivity analysis of near-source ground motions to 1-point and 2-point statistics of source parameters provides insights into relations between statistical rupture properties and ground motions. We observe that larger standard deviation and stronger correlation produce stronger peak ground motions in general. The proposed new source modelling approach will contribute to understanding the effect of earthquake source on near-source ground motion characteristics in a more quantitative and systematic way.

  20. Statistical behaviour of adaptive multilevel splitting algorithms in simple models

    International Nuclear Information System (INIS)

    Rolland, Joran; Simonnet, Eric

    2015-01-01

    Adaptive multilevel splitting algorithms have been introduced rather recently for estimating tail distributions in a fast and efficient way. In particular, they can be used for computing the so-called reactive trajectories corresponding to direct transitions from one metastable state to another. The algorithm is based on successive selection–mutation steps performed on the system in a controlled way. It has two intrinsic parameters, the number of particles/trajectories and the reaction coordinate used for discriminating good or bad trajectories. We investigate first the convergence in law of the algorithm as a function of the timestep for several simple stochastic models. Second, we consider the average duration of reactive trajectories for which no theoretical predictions exist. The most important aspect of this work concerns some systems with two degrees of freedom. They are studied in detail as a function of the reaction coordinate in the asymptotic regime where the number of trajectories goes to infinity. We show that during phase transitions, the statistics of the algorithm deviate significatively from known theoretical results when using non-optimal reaction coordinates. In this case, the variance of the algorithm is peaking at the transition and the convergence of the algorithm can be much slower than the usual expected central limit behaviour. The duration of trajectories is affected as well. Moreover, reactive trajectories do not correspond to the most probable ones. Such behaviour disappears when using the optimal reaction coordinate called committor as predicted by the theory. We finally investigate a three-state Markov chain which reproduces this phenomenon and show logarithmic convergence of the trajectory durations

  1. Modelling malaria treatment practices in Bangladesh using spatial statistics

    Directory of Open Access Journals (Sweden)

    Haque Ubydul

    2012-03-01

    Full Text Available Abstract Background Malaria treatment-seeking practices vary worldwide and Bangladesh is no exception. Individuals from 88 villages in Rajasthali were asked about their treatment-seeking practices. A portion of these households preferred malaria treatment from the National Control Programme, but still a large number of households continued to use drug vendors and approximately one fourth of the individuals surveyed relied exclusively on non-control programme treatments. The risks of low-control programme usage include incomplete malaria treatment, possible misuse of anti-malarial drugs, and an increased potential for drug resistance. Methods The spatial patterns of treatment-seeking practices were first examined using hot-spot analysis (Local Getis-Ord Gi statistic and then modelled using regression. Ordinary least squares (OLS regression identified key factors explaining more than 80% of the variation in control programme and vendor treatment preferences. Geographically weighted regression (GWR was then used to assess where each factor was a strong predictor of treatment-seeking preferences. Results Several factors including tribal affiliation, housing materials, household densities, education levels, and proximity to the regional urban centre, were found to be effective predictors of malaria treatment-seeking preferences. The predictive strength of each of these factors, however, varied across the study area. While education, for example, was a strong predictor in some villages, it was less important for predicting treatment-seeking outcomes in other villages. Conclusion Understanding where each factor is a strong predictor of treatment-seeking outcomes may help in planning targeted interventions aimed at increasing control programme usage. Suggested strategies include providing additional training for the Building Resources across Communities (BRAC health workers, implementing educational programmes, and addressing economic factors.

  2. Models for probability and statistical inference theory and applications

    CERN Document Server

    Stapleton, James H

    2007-01-01

    This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...

  3. Statistical detection model for eddy-current systems

    International Nuclear Information System (INIS)

    Martinez, J.R.; Bahr, A.J.

    1984-01-01

    This chapter presents a detailed analysis of some measured noise data and the results of using those data with a probe-flaw interaction model to compute the surface-crack detection characteristics of two different air-core coil probes. The objective is to develop a statistical model for determining the probability of detecting a given flaw using an eddy-current system. The basis for developing a statistical detection model is a measurement model that relates the output voltage of the system to its various signal and noise components. Topics considered include statistics of the measured background voltage, calibration of the probe-flaw interaction model and signal-to-noise ratio (SNR) definition, the operating characteristic, and a comparison of air-core probes

  4. Relative effects of statistical preprocessing and postprocessing on a regional hydrological ensemble prediction system

    Directory of Open Access Journals (Sweden)

    S. Sharma

    2018-03-01

    Full Text Available The relative roles of statistical weather preprocessing and streamflow postprocessing in hydrological ensemble forecasting at short- to medium-range forecast lead times (day 1–7 are investigated. For this purpose, a regional hydrologic ensemble prediction system (RHEPS is developed and implemented. The RHEPS is comprised of the following components: (i hydrometeorological observations (multisensor precipitation estimates, gridded surface temperature, and gauged streamflow; (ii weather ensemble forecasts (precipitation and near-surface temperature from the National Centers for Environmental Prediction 11-member Global Ensemble Forecast System Reforecast version 2 (GEFSRv2; (iii NOAA's Hydrology Laboratory-Research Distributed Hydrologic Model (HL-RDHM; (iv heteroscedastic censored logistic regression (HCLR as the statistical preprocessor; (v two statistical postprocessors, an autoregressive model with a single exogenous variable (ARX(1,1 and quantile regression (QR; and (vi a comprehensive verification strategy. To implement the RHEPS, 1 to 7 days weather forecasts from the GEFSRv2 are used to force HL-RDHM and generate raw ensemble streamflow forecasts. Forecasting experiments are conducted in four nested basins in the US Middle Atlantic region, ranging in size from 381 to 12 362 km2. Results show that the HCLR preprocessed ensemble precipitation forecasts have greater skill than the raw forecasts. These improvements are more noticeable in the warm season at the longer lead times (> 3 days. Both postprocessors, ARX(1,1 and QR, show gains in skill relative to the raw ensemble streamflow forecasts, particularly in the cool season, but QR outperforms ARX(1,1. The scenarios that implement preprocessing and postprocessing separately tend to perform similarly, although the postprocessing-alone scenario is often more effective. The scenario involving both preprocessing and postprocessing consistently outperforms the other

  5. Statistical prediction of the numbers of degraded tubes in nuclear power plant steam generators

    International Nuclear Information System (INIS)

    Gallucci, R.H.V.; Klisiewicz, J.W.; Craig, K.R.

    1990-01-01

    Corrosion of nuclear power plant steam generator (SG) tubes often necessitates plugging/sleeving, causing decreased SG thermal performance and possible SG replacement. Statistical methods have been developed to predict probabilistically the numbers of tubes degraded due to secondary side pitting, wastage, and intergranular attack/stress-corrosion cracking. Inspection data from two Combustion Engineering (C-E) plants have been converted into statistics representing defect formation and growth. Computer simulation programs have been generated to predict the numbers of tubes to be plugged/sleeved during future outages. The probabilistic predictions for both plants successfully have bounded subsequent observations. While so far applied only to C-E SGs for the three degradation phenomena, the statistical methodology is adaptable to other SG types and phenomena

  6. A simple formula for insertion loss prediction of large acoustical enclosures using statistical energy analysis method

    Directory of Open Access Journals (Sweden)

    Kim Hyun-Sil

    2014-12-01

    Full Text Available Insertion loss prediction of large acoustical enclosures using Statistical Energy Analysis (SEA method is presented. The SEA model consists of three elements: sound field inside the enclosure, vibration energy of the enclosure panel, and sound field outside the enclosure. It is assumed that the space surrounding the enclosure is sufficiently large so that there is no energy flow from the outside to the wall panel or to air cavity inside the enclosure. The comparison of the predicted insertion loss to the measured data for typical large acoustical enclosures shows good agreements. It is found that if the critical frequency of the wall panel falls above the frequency region of interest, insertion loss is dominated by the sound transmission loss of the wall panel and averaged sound absorption coefficient inside the enclosure. However, if the critical frequency of the wall panel falls into the frequency region of interest, acoustic power from the sound radiation by the wall panel must be added to the acoustic power from transmission through the panel.

  7. Decoding β-decay systematics: A global statistical model for β- half-lives

    International Nuclear Information System (INIS)

    Costiris, N. J.; Mavrommatis, E.; Gernoth, K. A.; Clark, J. W.

    2009-01-01

    Statistical modeling of nuclear data provides a novel approach to nuclear systematics complementary to established theoretical and phenomenological approaches based on quantum theory. Continuing previous studies in which global statistical modeling is pursued within the general framework of machine learning theory, we implement advances in training algorithms designed to improve generalization, in application to the problem of reproducing and predicting the half-lives of nuclear ground states that decay 100% by the β - mode. More specifically, fully connected, multilayer feed-forward artificial neural network models are developed using the Levenberg-Marquardt optimization algorithm together with Bayesian regularization and cross-validation. The predictive performance of models emerging from extensive computer experiments is compared with that of traditional microscopic and phenomenological models as well as with the performance of other learning systems, including earlier neural network models as well as the support vector machines recently applied to the same problem. In discussing the results, emphasis is placed on predictions for nuclei that are far from the stability line, and especially those involved in r-process nucleosynthesis. It is found that the new statistical models can match or even surpass the predictive performance of conventional models for β-decay systematics and accordingly should provide a valuable additional tool for exploring the expanding nuclear landscape.

  8. Information Geometric Complexity of a Trivariate Gaussian Statistical Model

    Directory of Open Access Journals (Sweden)

    Domenico Felice

    2014-05-01

    Full Text Available We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.

  9. Prediction of Chemical Function: Model Development and ...

    Science.gov (United States)

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi

  10. A prediction model for Clostridium difficile recurrence

    Directory of Open Access Journals (Sweden)

    Francis D. LaBarbera

    2015-02-01

    Full Text Available Background: Clostridium difficile infection (CDI is a growing problem in the community and hospital setting. Its incidence has been on the rise over the past two decades, and it is quickly becoming a major concern for the health care system. High rate of recurrence is one of the major hurdles in the successful treatment of C. difficile infection. There have been few studies that have looked at patterns of recurrence. The studies currently available have shown a number of risk factors associated with C. difficile recurrence (CDR; however, there is little consensus on the impact of most of the identified risk factors. Methods: Our study was a retrospective chart review of 198 patients diagnosed with CDI via Polymerase Chain Reaction (PCR from February 2009 to Jun 2013. In our study, we decided to use a machine learning algorithm called the Random Forest (RF to analyze all of the factors proposed to be associated with CDR. This model is capable of making predictions based on a large number of variables, and has outperformed numerous other models and statistical methods. Results: We came up with a model that was able to accurately predict the CDR with a sensitivity of 83.3%, specificity of 63.1%, and area under curve of 82.6%. Like other similar studies that have used the RF model, we also had very impressive results. Conclusions: We hope that in the future, machine learning algorithms, such as the RF, will see a wider application.

  11. A generative model for predicting terrorist incidents

    Science.gov (United States)

    Verma, Dinesh C.; Verma, Archit; Felmlee, Diane; Pearson, Gavin; Whitaker, Roger

    2017-05-01

    A major concern in coalition peace-support operations is the incidence of terrorist activity. In this paper, we propose a generative model for the occurrence of the terrorist incidents, and illustrate that an increase in diversity, as measured by the number of different social groups to which that an individual belongs, is inversely correlated with the likelihood of a terrorist incident in the society. A generative model is one that can predict the likelihood of events in new contexts, as opposed to statistical models which are used to predict the future incidents based on the history of the incidents in an existing context. Generative models can be useful in planning for persistent Information Surveillance and Reconnaissance (ISR) since they allow an estimation of regions in the theater of operation where terrorist incidents may arise, and thus can be used to better allocate the assignment and deployment of ISR assets. In this paper, we present a taxonomy of terrorist incidents, identify factors related to occurrence of terrorist incidents, and provide a mathematical analysis calculating the likelihood of occurrence of terrorist incidents in three common real-life scenarios arising in peace-keeping operations

  12. Dietary information improves cardiovascular disease risk prediction models.

    Science.gov (United States)

    Baik, I; Cho, N H; Kim, S H; Shin, C

    2013-01-01

    Data are limited on cardiovascular disease (CVD) risk prediction models that include dietary predictors. Using known risk factors and dietary information, we constructed and evaluated CVD risk prediction models. Data for modeling were from population-based prospective cohort studies comprised of 9026 men and women aged 40-69 years. At baseline, all were free of known CVD and cancer, and were followed up for CVD incidence during an 8-year period. We used Cox proportional hazard regression analysis to construct a traditional risk factor model, an office-based model, and two diet-containing models and evaluated these models by calculating Akaike information criterion (AIC), C-statistics, integrated discrimination improvement (IDI), net reclassification improvement (NRI) and calibration statistic. We constructed diet-containing models with significant dietary predictors such as poultry, legumes, carbonated soft drinks or green tea consumption. Adding dietary predictors to the traditional model yielded a decrease in AIC (delta AIC=15), a 53% increase in relative IDI (P-value for IDI NRI (category-free NRI=0.14, P NRI (category-free NRI=0.08, P<0.01) compared with the office-based model. The calibration plots for risk prediction demonstrated that the inclusion of dietary predictors contributes to better agreement in persons at high risk for CVD. C-statistics for the four models were acceptable and comparable. We suggest that dietary information may be useful in constructing CVD risk prediction models.

  13. Assessing the Transferability of Statistical Predictive Models for Leaf Area Index Between Two Airborne Discrete Return LiDAR Sensor Designs Within Multiple Intensely Managed Loblolly Pine Forest Locations in the South-Eastern USA

    Science.gov (United States)

    Sumnall, Matthew; Peduzzi, Alicia; Fox, Thomas R.; Wynne, Randolph H.; Thomas, Valerie A.; Cook, Bruce

    2016-01-01

    Leaf area is an important forest structural variable which serves as the primary means of mass and energy exchange within vegetated ecosystems. The objective of the current study was to determine if leaf area index (LAI) could be estimated accurately and consistently in five intensively managed pine plantation forests using two multiple-return airborne LiDAR datasets. Field measurements of LAI were made using the LiCOR LAI2000 and LAI2200 instruments within 116 plots were established of varying size and within a variety of stand conditions (i.e. stand age, nutrient regime and stem density) in North Carolina and Virginia in 2008 and 2013. A number of common LiDAR return height and intensity distribution metrics were calculated (e.g. average return height), in addition to ten indices, with two additional variants, utilized in the surrounding literature which have been used to estimate LAI and fractional cover, were calculated from return heights and intensity, for each plot extent. Each of the indices was assessed for correlation with each other, and was used as independent variables in linear regression analysis with field LAI as the dependent variable. All LiDAR derived metrics were also entered into a forward stepwise linear regression. The results from each of the indices varied from an R2 of 0.33 (S.E. 0.87) to 0.89 (S.E. 0.36). Those indices calculated using ratios of all returns produced the strongest correlations, such as the Above and Below Ratio Index (ABRI) and Laser Penetration Index 1 (LPI1). The regression model produced from a combination of three metrics did not improve correlations greatly (R2 0.90; S.E. 0.35). The results indicate that LAI can be predicted over a range of intensively managed pine plantation forest environments accurately when using different LiDAR sensor designs. Those indices which incorporated counts of specific return numbers (e.g. first returns) or return intensity correlated poorly with field measurements. There were

  14. Predictive modelling of noise level generated during sawing of rocks ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Influence of the operating variables and rock properties on the noise level are investigated and analysed. Statistical analyses are then employed and models are built for the prediction of noise levels depending on the operating variables and the rock properties. The derived models are validated through ...

  15. Statistical Analysis of a Method to Predict Drug-Polymer Miscibility

    DEFF Research Database (Denmark)

    Knopp, Matthias Manne; Olesen, Niels Erik; Huang, Yanbin

    2016-01-01

    In this study, a method proposed to predict drug-polymer miscibility from differential scanning calorimetry measurements was subjected to statistical analysis. The method is relatively fast and inexpensive and has gained popularity as a result of the increasing interest in the formulation of drugs...... procedure is problematic and may foster uncritical and misguiding interpretations. From a statistical perspective, the drug-polymer miscibility prediction should instead be examined by deriving an objective function, which results in the unbiased, minimum variance properties of the least-square estimator...

  16. Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model

    Science.gov (United States)

    Boone, Spencer

    2017-01-01

    This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.

  17. Linear mixed models a practical guide using statistical software

    CERN Document Server

    West, Brady T; Galecki, Andrzej T

    2006-01-01

    Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo

  18. Statistical Model and the mesonic-baryonic transition region

    CERN Document Server

    Oeschler, H.; Redlich, K.; Wheaton, S.

    2009-01-01

    The statistical model assuming chemical equilibriumand local strangeness conservation describes most of the observed features of strange particle production from SIS up to RHIC. Deviations are found as the maximum in the measured K+/pi+ ratio is much sharper than in the model calculations. At the incident energy of the maximum, the statistical model shows that freeze out changes regime from one being dominated by baryons at the lower energies toward one being dominated by mesons. It will be shown how deviations from the usual freeze-out curve influence the various particle ratios. Furthermore, other observables exhibit also changes just in this energy regime.

  19. Multiple commodities in statistical microeconomics: Model and market

    Science.gov (United States)

    Baaquie, Belal E.; Yu, Miao; Du, Xin

    2016-11-01

    A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.

  20. Prediction of protein retention times in hydrophobic interaction chromatography by robust statistical characterization of their atomic-level surface properties.

    Science.gov (United States)

    Hanke, Alexander T; Klijn, Marieke E; Verhaert, Peter D E M; van der Wielen, Luuk A M; Ottens, Marcel; Eppink, Michel H M; van de Sandt, Emile J A X

    2016-03-01

    The correlation between the dimensionless retention times (DRT) of proteins in hydrophobic interaction chromatography (HIC) and their surface properties were investigated. A ternary atomic-level hydrophobicity scale was used to calculate the distribution of local average hydrophobicity across the proteins surfaces. These distributions were characterized by robust descriptive statistics to reduce their sensitivity to small changes in the three-dimensional structure. The applicability of these statistics for the prediction of protein retention behaviour was looked into. A linear combination of robust statistics describing the central tendency, heterogeneity and frequency of highly hydrophobic clusters was found to have a good predictive capability (R2  = 0.78), when combined a factor to account for protein size differences. The achieved error of prediction was 35% lower than for a similar model based on a description of the protein surface on an amino acid level. This indicates that a robust and mathematically simple model based on an atomic description of the protein surface can be used for the prediction of the retention behaviour of conformationally stable globular proteins with a well determined 3D structure in HIC. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:372-381, 2016. © 2016 American Institute of Chemical Engineers.

  1. Multi-region Statistical Shape Model for Cochlear Implantation

    DEFF Research Database (Denmark)

    Romera, Jordi; Kjer, H. Martin; Piella, Gemma

    2016-01-01

    Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achie...

  2. Evaluation of Statistical Models for Analysis of Insect, Disease and ...

    African Journals Online (AJOL)

    It is concluded that LMMs and GLMs simultaneously consider the effect of treatments and heterogeneity of variance and hence are more appropriate for analysis of abundance and incidence data than ordinary ANOVA. Keywords: Mixed Models; Generalized Linear Models; Statistical Power East African Journal of Sciences ...

  3. Complex Data Modeling and Computationally Intensive Statistical Methods

    CERN Document Server

    Mantovan, Pietro

    2010-01-01

    The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici

  4. Statistics and Limits of Linear-Prediction Quantification of Magnetic Resonance Spectral Parameters

    Science.gov (United States)

    Koehl, P.; Ling, C.; Lefevre, J. F.

    Linear prediction has attracted considerable interest as an alternative approach to fast Fourier transform for quantification of NMR signals. Based on a Monte Carlo method, we report a statistical investigation on two forms of linear prediction to examine the accuracy and precision with which the positions, linewidths, intensities, and phases of NMR lines in complex, crowded spectra may be determined. The two forms of linear prediction differ in the methods used to obtain the prediction coefficients, chosen to be least squares in one and total least squares in the other. Both methods are shown to be very sensitive to the level of noise in the experimental spectrum, as well as to the prediction order chosen for the analysis. Linear-prediction estimates of the frequencies are usually very reliable, while estimates of the other parameters such as linewidths and intensities are usually poorer.

  5. Validation of statistical models for creep rupture by parametric analysis

    Energy Technology Data Exchange (ETDEWEB)

    Bolton, J., E-mail: john.bolton@uwclub.net [65, Fisher Ave., Rugby, Warks CV22 5HW (United Kingdom)

    2012-01-15

    Statistical analysis is an efficient method for the optimisation of any candidate mathematical model of creep rupture data, and for the comparative ranking of competing models. However, when a series of candidate models has been examined and the best of the series has been identified, there is no statistical criterion to determine whether a yet more accurate model might be devised. Hence there remains some uncertainty that the best of any series examined is sufficiently accurate to be considered reliable as a basis for extrapolation. This paper proposes that models should be validated primarily by parametric graphical comparison to rupture data and rupture gradient data. It proposes that no mathematical model should be considered reliable for extrapolation unless the visible divergence between model and data is so small as to leave no apparent scope for further reduction. This study is based on the data for a 12% Cr alloy steel used in BS PD6605:1998 to exemplify its recommended statistical analysis procedure. The models considered in this paper include a) a relatively simple model, b) the PD6605 recommended model and c) a more accurate model of somewhat greater complexity. - Highlights: Black-Right-Pointing-Pointer The paper discusses the validation of creep rupture models derived from statistical analysis. Black-Right-Pointing-Pointer It demonstrates that models can be satisfactorily validated by a visual-graphic comparison of models to data. Black-Right-Pointing-Pointer The method proposed utilises test data both as conventional rupture stress and as rupture stress gradient. Black-Right-Pointing-Pointer The approach is shown to be more reliable than a well-established and widely used method (BS PD6605).

  6. Statistical simulation of hadron-nucleus and light nucleus-nucleus interaction. Intranuclear cascade model

    International Nuclear Information System (INIS)

    Lobov, G.A.; Stepanov, N.V.; Sibirtsev, A.A.; Trebukhovskij, Yu.V.

    1983-01-01

    A new version of the program of statistical simulation of hadron-nucleus and light nucleus-nucleus interaction is elaborated. The cascade part of the program is described. The comparison of model predictions with the proton-nucleus interaction experiments is performed. A satisfactory calculations-experiment agreement is obtained

  7. A new method to determine the number of experimental data using statistical modeling methods

    Energy Technology Data Exchange (ETDEWEB)

    Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)

    2017-06-15

    For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.

  8. Statistical Validation of Normal Tissue Complication Probability Models

    Energy Technology Data Exchange (ETDEWEB)

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Veld, Aart A. van' t; Langendijk, Johannes A. [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schilstra, Cornelis [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Radiotherapy Institute Friesland, Leeuwarden (Netherlands)

    2012-09-01

    Purpose: To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. Methods and Materials: A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Results: Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Conclusion: Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use.

  9. Modern statistical models for forensic fingerprint examinations: a critical review.

    Science.gov (United States)

    Abraham, Joshua; Champod, Christophe; Lennard, Chris; Roux, Claude

    2013-10-10

    Over the last decade, the development of statistical models in support of forensic fingerprint identification has been the subject of increasing research attention, spurned on recently by commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. Such models are increasingly seen as useful tools in support of the fingerprint identification process within or in addition to the ACE-V framework. This paper provides a critical review of recent statistical models from both a practical and theoretical perspective. This includes analysis of models of two different methodologies: Probability of Random Correspondence (PRC) models that focus on calculating probabilities of the occurrence of fingerprint configurations for a given population, and Likelihood Ratio (LR) models which use analysis of corresponding features of fingerprints to derive a likelihood value representing the evidential weighting for a potential source. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  10. Growth Curve Models and Applications : Indian Statistical Institute

    CERN Document Server

    2017-01-01

    Growth curve models in longitudinal studies are widely used to model population size, body height, biomass, fungal growth, and other variables in the biological sciences, but these statistical methods for modeling growth curves and analyzing longitudinal data also extend to general statistics, economics, public health, demographics, epidemiology, SQC, sociology, nano-biotechnology, fluid mechanics, and other applied areas.   There is no one-size-fits-all approach to growth measurement. The selected papers in this volume build on presentations from the GCM workshop held at the Indian Statistical Institute, Giridih, on March 28-29, 2016. They represent recent trends in GCM research on different subject areas, both theoretical and applied. This book includes tools and possibilities for further work through new techniques and modification of existing ones. The volume includes original studies, theoretical findings and case studies from a wide range of app lied work, and these contributions have been externally r...

  11. Prediction Model for Gastric Cancer Incidence in Korean Population.

    Science.gov (United States)

    Eom, Bang Wool; Joo, Jungnam; Kim, Sohee; Shin, Aesun; Yang, Hye-Ryung; Park, Junghyun; Choi, Il Ju; Kim, Young-Woo; Kim, Jeongseon; Nam, Byung-Ho

    2015-01-01

    Predicting high risk groups for gastric cancer and motivating these groups to receive regular checkups is required for the early detection of gastric cancer. The aim of this study is was to develop a prediction model for gastric cancer incidence based on a large population-based cohort in Korea. Based on the National Health Insurance Corporation data, we analyzed 10 major risk factors for gastric cancer. The Cox proportional hazards model was used to develop gender specific prediction models for gastric cancer development, and the performance of the developed model in terms of discrimination and calibration was also validated using an independent cohort. Discrimination ability was evaluated using Harrell's C-statistics, and the calibration was evaluated using a calibration plot and slope. During a median of 11.4 years of follow-up, 19,465 (1.4%) and 5,579 (0.7%) newly developed gastric cancer cases were observed among 1,372,424 men and 804,077 women, respectively. The prediction models included age, BMI, family history, meal regularity, salt preference, alcohol consumption, smoking and physical activity for men, and age, BMI, family history, salt preference, alcohol consumption, and smoking for women. This prediction model showed good accuracy and predictability in both the developing and validation cohorts (C-statistics: 0.764 for men, 0.706 for women). In this study, a prediction model for gastric cancer incidence was developed that displayed a good performance.

  12. Prediction Model for Gastric Cancer Incidence in Korean Population.

    Directory of Open Access Journals (Sweden)

    Bang Wool Eom

    Full Text Available Predicting high risk groups for gastric cancer and motivating these groups to receive regular checkups is required for the early detection of gastric cancer. The aim of this study is was to develop a prediction model for gastric cancer incidence based on a large population-based cohort in Korea.Based on the National Health Insurance Corporation data, we analyzed 10 major risk factors for gastric cancer. The Cox proportional hazards model was used to develop gender specific prediction models for gastric cancer development, and the performance of the developed model in terms of discrimination and calibration was also validated using an independent cohort. Discrimination ability was evaluated using Harrell's C-statistics, and the calibration was evaluated using a calibration plot and slope.During a median of 11.4 years of follow-up, 19,465 (1.4% and 5,579 (0.7% newly developed gastric cancer cases were observed among 1,372,424 men and 804,077 women, respectively. The prediction models included age, BMI, family history, meal regularity, salt preference, alcohol consumption, smoking and physical activity for men, and age, BMI, family history, salt preference, alcohol consumption, and smoking for women. This prediction model showed good accuracy and predictability in both the developing and validation cohorts (C-statistics: 0.764 for men, 0.706 for women.In this study, a prediction model for gastric cancer incidence was developed that displayed a good performance.

  13. Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification

    Directory of Open Access Journals (Sweden)

    A. Sarri

    2012-06-01

    Full Text Available Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake. Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.

  14. Statistical modelling for recurrent events: an application to sports injuries.

    Science.gov (United States)

    Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F

    2014-09-01

    Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  15. Predicting Student Success in a Psychological Statistics Course Emphasizing Collaborative Learning

    Science.gov (United States)

    Gorvine, Benjamin J.; Smith, H. David

    2015-01-01

    This study describes the use of a collaborative learning approach in a psychological statistics course and examines the factors that predict which students benefit most from such an approach in terms of learning outcomes. In a course format with a substantial group work component, 166 students were surveyed on their preference for individual…

  16. Applications of modeling in polymer-property prediction

    Science.gov (United States)

    Case, F. H.

    1996-08-01

    A number of molecular modeling techniques have been applied for the prediction of polymer properties and behavior. Five examples illustrate the range of methodologies used. A simple atomistic simulation of small polymer fragments is used to estimate drug compatibility with a polymer matrix. The analysis of molecular dynamics results from a more complex model of a swollen hydrogel system is used to study gas diffusion in contact lenses. Statistical mechanics are used to predict conformation dependent properties — an example is the prediction of liquid-crystal formation. The effect of the molecular weight distribution on phase separation in polyalkanes is predicted using thermodynamic models. In some cases, the properties of interest cannot be directly predicted using simulation methods or polymer theory. Correlation methods may be used to bridge the gap between molecular structure and macroscopic properties. The final example shows how connectivity-indices-based quantitative structure-property relationships were used to predict properties for candidate polyimids in an electronics application.

  17. The Statistical Modeling of the Trends Concerning the Romanian Population

    Directory of Open Access Journals (Sweden)

    Gabriela OPAIT

    2014-11-01

    Full Text Available This paper reflects the statistical modeling concerning the resident population in Romania, respectively the total of the romanian population, through by means of the „Least Squares Method”. Any country it develops by increasing of the population, respectively of the workforce, which is a factor of influence for the growth of the Gross Domestic Product (G.D.P.. The „Least Squares Method” represents a statistical technique for to determine the trend line of the best fit concerning a model.

  18. Statistical Model of the 2001 Czech Census for Interactive Presentation

    Czech Academy of Sciences Publication Activity Database

    Grim, Jiří; Hora, Jan; Boček, Pavel; Somol, Petr; Pudil, Pavel

    Vol. 26, č. 4 (2010), s. 1-23 ISSN 0282-423X R&D Projects: GA ČR GA102/07/1594; GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Interactive statistical model * census data presentation * distribution mixtures * data modeling * EM algorithm * incomplete data * data reproduction accuracy * data mining Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.492, year: 2010 http://library.utia.cas.cz/separaty/2010/RO/grim-0350513.pdf

  19. Predictability in models of the atmospheric circulation

    NARCIS (Netherlands)

    Houtekamer, P.L.

    1992-01-01

    It will be clear from the above discussions that skill forecasts are still in their infancy. Operational skill predictions do not exist. One is still struggling to prove that skill predictions, at any range, have any quality at all. It is not clear what the statistics of the analysis error

  20. Analyzing sickness absence with statistical models for survival data

    DEFF Research Database (Denmark)

    Christensen, Karl Bang; Andersen, Per Kragh; Smith-Hansen, Lars

    2007-01-01

    OBJECTIVES: Sickness absence is the outcome in many epidemiologic studies and is often based on summary measures such as the number of sickness absences per year. In this study the use of modern statistical methods was examined by making better use of the available information. Since sickness...... absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...

  1. A Review of Modeling Bioelectrochemical Systems: Engineering and Statistical Aspects

    Directory of Open Access Journals (Sweden)

    Shuai Luo

    2016-02-01

    Full Text Available Bioelectrochemical systems (BES are promising technologies to convert organic compounds in wastewater to electrical energy through a series of complex physical-chemical, biological and electrochemical processes. Representative BES such as microbial fuel cells (MFCs have been studied and advanced for energy recovery. Substantial experimental and modeling efforts have been made for investigating the processes involved in electricity generation toward the improvement of the BES performance for practical applications. However, there are many parameters that will potentially affect these processes, thereby making the optimization of system performance hard to be achieved. Mathematical models, including engineering models and statistical models, are powerful tools to help understand the interactions among the parameters in BES and perform optimization of BES configuration/operation. This review paper aims to introduce and discuss the recent developments of BES modeling from engineering and statistical aspects, including analysis on the model structure, description of application cases and sensitivity analysis of various parameters. It is expected to serves as a compass for integrating the engineering and statistical modeling strategies to improve model accuracy for BES development.

  2. Predictive models for acute kidney injury following cardiac surgery.

    Science.gov (United States)

    Demirjian, Sevag; Schold, Jesse D; Navia, Jose; Mastracci, Tara M; Paganini, Emil P; Yared, Jean-Pierre; Bashour, Charles A

    2012-03-01

    Accurate prediction of cardiac surgery-associated acute kidney injury (AKI) would improve clinical decision making and facilitate timely diagnosis and treatment. The aim of the study was to develop predictive models for cardiac surgery-associated AKI using presurgical and combined pre- and intrasurgical variables. Prospective observational cohort. 25,898 patients who underwent cardiac surgery at Cleveland Clinic in 2000-2008. Presurgical and combined pre- and intrasurgical variables were used to develop predictive models. Dialysis therapy and a composite of doubling of serum creatinine level or dialysis therapy within 2 weeks (or discharge if sooner) after cardiac surgery. Incidences of dialysis therapy and the composite of doubling of serum creatinine level or dialysis therapy were 1.7% and 4.3%, respectively. Kidney function parameters were strong independent predictors in all 4 models. Surgical complexity reflected by type and history of previous cardiac surgery were robust predictors in models based on presurgical variables. However, the inclusion of intrasurgical variables accounted for all explained variance by procedure-related information. Models predictive of dialysis therapy showed good calibration and superb discrimination; a combined (pre- and intrasurgical) model performed better than the presurgical model alone (C statistics, 0.910 and 0.875, respectively). Models predictive of the composite end point also had excellent discrimination with both presurgical and combined (pre- and intrasurgical) variables (C statistics, 0.797 and 0.825, respectively). However, the presurgical model predictive of the composite end point showed suboptimal calibration (P predictive models in other cohorts is required before wide-scale application. We developed and internally validated 4 new models that accurately predict cardiac surgery-associated AKI. These models are based on readily available clinical information and can be used for patient counseling, clinical

  3. Statistical learning modeling method for space debris photometric measurement

    Science.gov (United States)

    Sun, Wenjing; Sun, Jinqiu; Zhang, Yanning; Li, Haisen

    2016-03-01

    Photometric measurement is an important way to identify the space debris, but the present methods of photometric measurement have many constraints on star image and need complex image processing. Aiming at the problems, a statistical learning modeling method for space debris photometric measurement is proposed based on the global consistency of the star image, and the statistical information of star images is used to eliminate the measurement noises. First, the known stars on the star image are divided into training stars and testing stars. Then, the training stars are selected as the least squares fitting parameters to construct the photometric measurement model, and the testing stars are used to calculate the measurement accuracy of the photometric measurement model. Experimental results show that, the accuracy of the proposed photometric measurement model is about 0.1 magnitudes.

  4. Statistical, Morphometric, Anatomical Shape Model (Atlas) of Calcaneus

    Science.gov (United States)

    Melinska, Aleksandra U.; Romaszkiewicz, Patryk; Wagel, Justyna; Sasiadek, Marek; Iskander, D. Robert

    2015-01-01

    The aim was to develop a morphometric and anatomically accurate atlas (statistical shape model) of calcaneus. The model is based on 18 left foot and 18 right foot computed tomography studies of 28 male individuals aged from 17 to 62 years, with no known foot pathology. A procedure for automatic atlas included extraction and identification of common features, averaging feature position, obtaining mean geometry, mathematical shape description and variability analysis. Expert manual assistance was included for the model to fulfil the accuracy sought by medical professionals. The proposed for the first time statistical shape model of the calcaneus could be of value in many orthopaedic applications including providing support in diagnosing pathological lesions, pre-operative planning, classification and treatment of calcaneus fractures as well as for the development of future implant procedures. PMID:26270812

  5. Workshop on Model Uncertainty and its Statistical Implications

    CERN Document Server

    1988-01-01

    In this book problems related to the choice of models in such diverse fields as regression, covariance structure, time series analysis and multinomial experiments are discussed. The emphasis is on the statistical implications for model assessment when the assessment is done with the same data that generated the model. This is a problem of long standing, notorious for its difficulty. Some contributors discuss this problem in an illuminating way. Others, and this is a truly novel feature, investigate systematically whether sample re-use methods like the bootstrap can be used to assess the quality of estimators or predictors in a reliable way given the initial model uncertainty. The book should prove to be valuable for advanced practitioners and statistical methodologists alike.

  6. Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data

    Science.gov (United States)

    Ladbury, R.; Campola, M. J.

    2015-01-01

    New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.

  7. Interactive comparison of hypothesis tests for statistical model checking

    NARCIS (Netherlands)

    de Boer, Pieter-Tjerk; Reijsbergen, D.P.; Scheinhardt, Willem R.W.

    2015-01-01

    We present a web-based interactive comparison of hypothesis tests as are used in statistical model checking, providing users and tool developers with more insight into their characteristics. Parameters can be modified easily and their influence is visualized in real time; an integrated simulation

  8. Syntactic discriminative language model rerankers for statistical machine translation

    NARCIS (Netherlands)

    Carter, S.; Monz, C.

    2011-01-01

    This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical

  9. Using statistical compatibility to derive advanced probabilistic fatigue models

    Czech Academy of Sciences Publication Activity Database

    Fernández-Canteli, A.; Castillo, E.; López-Aenlle, M.; Seitl, Stanislav

    2010-01-01

    Roč. 2, č. 1 (2010), s. 1131-1140 E-ISSN 1877-7058. [Fatigue 2010. Praha, 06.06.2010-11.06.2010] Institutional research plan: CEZ:AV0Z20410507 Keywords : Fatigue models * Statistical compatibility * Functional equations Subject RIV: JL - Materials Fatigue, Friction Mechanics

  10. Modelling geographical graduate job search using circular statistics

    NARCIS (Netherlands)

    Faggian, Alessandra; Corcoran, Jonathan; McCann, Philip

    Theory suggests that the spatial patterns of migration flows are contingent both on individual human capital and underlying geographical structures. Here we demonstrate these features by using circular statistics in an econometric modelling framework applied to the flows of UK university graduates.

  11. Two-dimensional models in statistical mechanics and field theory

    International Nuclear Information System (INIS)

    Koberle, R.

    1980-01-01

    Several features of two-dimensional models in statistical mechanics and Field theory, such as, lattice quantum chromodynamics, Z(N), Gross-Neveu and CP N-1 are discussed. The problems of confinement and dynamical mass generation are also analyzed. (L.C.) [pt

  12. Statistical properties of the nuclear shell-model Hamiltonian

    International Nuclear Information System (INIS)

    Dias, H.; Hussein, M.S.; Oliveira, N.A. de

    1986-01-01

    The statistical properties of realistic nuclear shell-model Hamiltonian are investigated in sd-shell nuclei. The probability distribution of the basic-vector amplitude is calculated and compared with the Porter-Thomas distribution. Relevance of the results to the calculation of the giant resonance mixing parameter is pointed out. (Author) [pt

  13. Eigenfunction statistics for Anderson model with Hölder continuous ...

    Indian Academy of Sciences (India)

    continuous (0 < α ≤ 1) single site distribution. In localized regime, we study the distri- bution of eigenfunctions in space and energy simultaneously. In a certain scaling limit, we prove limit points are Poisson. Keywords. Anderson model; Hölder continuous measure; Poisson statistics. 2010 Mathematics Subject Classification ...

  14. Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

    Science.gov (United States)

    Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.

    2015-01-01

    We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167

  15. Demonstrating the improvement of predictive maturity of a computational model

    Energy Technology Data Exchange (ETDEWEB)

    Hemez, Francois M [Los Alamos National Laboratory; Unal, Cetin [Los Alamos National Laboratory; Atamturktur, Huriye S [CLEMSON UNIV.

    2010-01-01

    We demonstrate an improvement of predictive capability brought to a non-linear material model using a combination of test data, sensitivity analysis, uncertainty quantification, and calibration. A model that captures increasingly complicated phenomena, such as plasticity, temperature and strain rate effects, is analyzed. Predictive maturity is defined, here, as the accuracy of the model to predict multiple Hopkinson bar experiments. A statistical discrepancy quantifies the systematic disagreement (bias) between measurements and predictions. Our hypothesis is that improving the predictive capability of a model should translate into better agreement between measurements and predictions. This agreement, in turn, should lead to a smaller discrepancy. We have recently proposed to use discrepancy and coverage, that is, the extent to which the physical experiments used for calibration populate the regime of applicability of the model, as basis to define a Predictive Maturity Index (PMI). It was shown that predictive maturity could be improved when additional physical tests are made available to increase coverage of the regime of applicability. This contribution illustrates how the PMI changes as 'better' physics are implemented in the model. The application is the non-linear Preston-Tonks-Wallace (PTW) strength model applied to Beryllium metal. We demonstrate that our framework tracks the evolution of maturity of the PTW model. Robustness of the PMI with respect to the selection of coefficients needed in its definition is also studied.

  16. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    Science.gov (United States)

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  17. Integration of Advanced Statistical Analysis Tools and Geophysical Modeling

    Science.gov (United States)

    2012-08-01

    1.56 0.48 Beale: MetalMapper Cued: Beale_MMstat Target: 477 Cell 202 of 1547 (SOI, 2OI) Model 1 of 3 (Inv #1 / 2 = SOI: 1 / 1) Tag...Statistical classification of buried unexploded ordnance using nonparametric prior models. IEEE Trans. Geosci. Remote Sensing, 45: 2794–2806, 2007. T...Bell and B. Barrow. Subsurface discrimination using electromagnetic induction sensors. IEEE Trans. Geosci. Remote Sensing, 39:1286–1293, 2001. S. D

  18. A Statistical Model for Synthesis of Detailed Facial Geometry

    OpenAIRE

    Golovinskiy, Aleksey; Matusik, Wojciech; Pfister, Hanspeter; Rusinkiewicz, Szymon; Funkhouser, Thomas

    2006-01-01

    Detailed surface geometry contributes greatly to the visual realism of 3D face models. However, acquiring high-resolution face geometry is often tedious and expensive. Consequently, most face models used in games, virtual reality, or computer vision look unrealistically smooth. In this paper, we introduce a new statistical technique for the analysis and synthesis of small three-dimensional facial features, such as wrinkles and pores. We acquire high-resolution face geometry for people across ...

  19. Statistical and RBF NN models : providing forecasts and risk assessment

    OpenAIRE

    Marček, Milan

    2009-01-01

    Forecast accuracy of economic and financial processes is a popular measure for quantifying the risk in decision making. In this paper, we develop forecasting models based on statistical (stochastic) methods, sometimes called hard computing, and on a soft method using granular computing. We consider the accuracy of forecasting models as a measure for risk evaluation. It is found that the risk estimation process based on soft methods is simplified and less critical to the question w...

  20. Advances on statistical/thermodynamical models for unpolarized structure functions

    Energy Technology Data Exchange (ETDEWEB)

    Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [Universidade Federal dos Vales do Jequitinhonha e Mucuri, Campus do Mucuri, 39803-371, Teofilo Otoni, Minas Gerais (Brazil); Tomio, Lauro [Instituto de Fisica Teorica, Universidade Estadual Paulista, R. Dr. Bento Teobaldo Ferraz 271, Bl II Barra Funda, 01140070, Sao Paulo, SP (Brazil)

    2013-03-25

    During the eights and nineties many statistical/thermodynamical models were proposed to describe the nucleons' structure functions and distribution of the quarks in the hadrons. Most of these models describe the compound quarks and gluons inside the nucleon as a Fermi / Bose gas respectively, confined in a MIT bag with continuous energy levels. Another models considers discrete spectrum. Some interesting features of the nucleons are obtained by these models, like the sea asymmetries {sup -}d/{sup -}u and {sup -}d-{sup -}u.

  1. Statistical modelling of transcript profiles of differentially regulated genes

    Directory of Open Access Journals (Sweden)

    Sergeant Martin J

    2008-07-01

    Full Text Available Abstract Background The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Split-line" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t = A + (B + CtRt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data

  2. A model of the statistical power of comparative genome sequence analysis.

    OpenAIRE

    Sean R Eddy

    2005-01-01

    Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identi...

  3. Iowa calibration of MEPDG performance prediction models.

    Science.gov (United States)

    2013-06-01

    This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement : performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 : representative p...

  4. Model complexity control for hydrologic prediction

    NARCIS (Netherlands)

    Schoups, G.; Van de Giesen, N.C.; Savenije, H.H.G.

    2008-01-01

    A common concern in hydrologic modeling is overparameterization of complex models given limited and noisy data. This leads to problems of parameter nonuniqueness and equifinality, which may negatively affect prediction uncertainties. A systematic way of controlling model complexity is therefore

  5. WE-A-201-02: Modern Statistical Modeling

    International Nuclear Information System (INIS)

    Niemierko, A.

    2016-01-01

    Chris Marshall: Memorial Introduction Donald Edmonds Herbert Jr., or Don to his colleagues and friends, exemplified the “big tent” vision of medical physics, specializing in Applied Statistics and Dynamical Systems theory. He saw, more clearly than most, that “Making models is the difference between doing science and just fooling around [ref Woodworth, 2004]”. Don developed an interest in chemistry at school by “reading a book” - a recurring theme in his story. He was awarded a Westinghouse Science scholarship and attended the Carnegie Institute of Technology (later Carnegie Mellon University) where his interest turned to physics and led to a BS in Physics after transfer to Northwestern University. After (voluntary) service in the Navy he earned his MS in Physics from the University of Oklahoma, which led him to Johns Hopkins University in Baltimore to pursue a PhD. The early death of his wife led him to take a salaried position in the Physics Department of Colorado College in Colorado Springs so as to better care for their young daughter. There, a chance invitation from Dr. Juan del Regato to teach physics to residents at the Penrose Cancer Hospital introduced him to Medical Physics, and he decided to enter the field. He received his PhD from the University of London (UK) under Prof. Joseph Rotblat, where I first met him, and where he taught himself statistics. He returned to Penrose as a clinical medical physicist, also largely self-taught. In 1975 he formalized an evolving interest in statistical analysis as Professor of Radiology and Head of the Division of Physics and Statistics at the College of Medicine of the University of South Alabama in Mobile, AL where he remained for the rest of his career. He also served as the first Director of their Bio-Statistics and Epidemiology Core Unit working in part on a sickle-cell disease. After retirement he remained active as Professor Emeritus. Don served for several years as a consultant to the Nuclear

  6. WE-A-201-02: Modern Statistical Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Niemierko, A.

    2016-06-15

    Chris Marshall: Memorial Introduction Donald Edmonds Herbert Jr., or Don to his colleagues and friends, exemplified the “big tent” vision of medical physics, specializing in Applied Statistics and Dynamical Systems theory. He saw, more clearly than most, that “Making models is the difference between doing science and just fooling around [ref Woodworth, 2004]”. Don developed an interest in chemistry at school by “reading a book” - a recurring theme in his story. He was awarded a Westinghouse Science scholarship and attended the Carnegie Institute of Technology (later Carnegie Mellon University) where his interest turned to physics and led to a BS in Physics after transfer to Northwestern University. After (voluntary) service in the Navy he earned his MS in Physics from the University of Oklahoma, which led him to Johns Hopkins University in Baltimore to pursue a PhD. The early death of his wife led him to take a salaried position in the Physics Department of Colorado College in Colorado Springs so as to better care for their young daughter. There, a chance invitation from Dr. Juan del Regato to teach physics to residents at the Penrose Cancer Hospital introduced him to Medical Physics, and he decided to enter the field. He received his PhD from the University of London (UK) under Prof. Joseph Rotblat, where I first met him, and where he taught himself statistics. He returned to Penrose as a clinical medical physicist, also largely self-taught. In 1975 he formalized an evolving interest in statistical analysis as Professor of Radiology and Head of the Division of Physics and Statistics at the College of Medicine of the University of South Alabama in Mobile, AL where he remained for the rest of his career. He also served as the first Director of their Bio-Statistics and Epidemiology Core Unit working in part on a sickle-cell disease. After retirement he remained active as Professor Emeritus. Don served for several years as a consultant to the Nuclear

  7. Organism-level models: When mechanisms and statistics fail us

    Science.gov (United States)

    Phillips, M. H.; Meyer, J.; Smith, W. P.; Rockhill, J. K.

    2014-03-01

    Purpose: To describe the unique characteristics of models that represent the entire course of radiation therapy at the organism level and to highlight the uses to which such models can be put. Methods: At the level of an organism, traditional model-building runs into severe difficulties. We do not have sufficient knowledge to devise a complete biochemistry-based model. Statistical model-building fails due to the vast number of variables and the inability to control many of them in any meaningful way. Finally, building surrogate models, such as animal-based models, can result in excluding some of the most critical variables. Bayesian probabilistic models (Bayesian networks) provide a useful alternative that have the advantages of being mathematically rigorous, incorporating the knowledge that we do have, and being practical. Results: Bayesian networks representing radiation therapy pathways for prostate cancer and head & neck cancer were used to highlight the important aspects of such models and some techniques of model-building. A more specific model representing the treatment of occult lymph nodes in head & neck cancer were provided as an example of how such a model can inform clinical decisions. A model of the possible role of PET imaging in brain cancer was used to illustrate the means by which clinical trials can be modelled in order to come up with a trial design that will have meaningful outcomes. Conclusions: Probabilistic models are currently the most useful approach to representing the entire therapy outcome process.

  8. Models Predicting Success of Infertility Treatment: A Systematic Review

    Science.gov (United States)

    Zarinara, Alireza; Zeraati, Hojjat; Kamali, Koorosh; Mohammad, Kazem; Shahnazari, Parisa; Akhondi, Mohammad Mehdi

    2016-01-01

    Background: Infertile couples are faced with problems that affect their marital life. Infertility treatment is expensive and time consuming and occasionally isn’t simply possible. Prediction models for infertility treatment have been proposed and prediction of treatment success is a new field in infertility treatment. Because prediction of treatment success is a new need for infertile couples, this paper reviewed previous studies for catching a general concept in applicability of the models. Methods: This study was conducted as a systematic review at Avicenna Research Institute in 2015. Six data bases were searched based on WHO definitions and MESH key words. Papers about prediction models in infertility were evaluated. Results: Eighty one papers were eligible for the study. Papers covered years after 1986 and studies were designed retrospectively and prospectively. IVF prediction models have more shares in papers. Most common predictors were age, duration of infertility, ovarian and tubal problems. Conclusion: Prediction model can be clinically applied if the model can be statistically evaluated and has a good validation for treatment success. To achieve better results, the physician and the couples’ needs estimation for treatment success rate were based on history, the examination and clinical tests. Models must be checked for theoretical approach and appropriate validation. The privileges for applying the prediction models are the decrease in the cost and time, avoiding painful treatment of patients, assessment of treatment approach for physicians and decision making for health managers. The selection of the approach for designing and using these models is inevitable. PMID:27141461

  9. Experimental, statistical, and biological models of radon carcinogenesis

    International Nuclear Information System (INIS)

    Cross, F.T.

    1991-09-01

    Risk models developed for underground miners have not been consistently validated in studies of populations exposed to indoor radon. Imprecision in risk estimates results principally from differences between exposures in mines as compared to domestic environments and from uncertainties about the interaction between cigarette-smoking and exposure to radon decay products. Uncertainties in extrapolating miner data to domestic exposures can be reduced by means of a broad-based health effects research program that addresses the interrelated issues of exposure, respiratory tract dose, carcinogenesis (molecular/cellular and animal studies, plus developing biological and statistical models), and the relationship of radon to smoking and other copollutant exposures. This article reviews experimental animal data on radon carcinogenesis observed primarily in rats at Pacific Northwest Laboratory. Recent experimental and mechanistic carcinogenesis models of exposures to radon, uranium ore dust, and cigarette smoke are presented with statistical analyses of animal data. 20 refs., 1 fig

  10. Statistical model selection with “Big Data”

    Directory of Open Access Journals (Sweden)

    Jurgen A. Doornik

    2015-12-01

    Full Text Available Big Data offer potential benefits for statistical modelling, but confront problems including an excess of false positives, mistaking correlations for causes, ignoring sampling biases and selecting by inappropriate methods. We consider the many important requirements when searching for a data-based relationship using Big Data, and the possible role of Autometrics in that context. Paramount considerations include embedding relationships in general initial models, possibly restricting the number of variables to be selected over by non-statistical criteria (the formulation problem, using good quality data on all variables, analyzed with tight significance levels by a powerful selection procedure, retaining available theory insights (the selection problem while testing for relationships being well specified and invariant to shifts in explanatory variables (the evaluation problem, using a viable approach that resolves the computational problem of immense numbers of possible models.

  11. Experimental, statistical and biological models of radon carcinogenesis

    International Nuclear Information System (INIS)

    Cross, F.T.

    1992-01-01

    Risk models developed for underground miners have not been consistently validated in studies of populations exposed to indoor radon. Imprecision in risk estimates results principally from differences between exposures in mines as compared with domestic environments and from uncertainties about the interaction between cigarette smoking and exposure to radon decay products. Uncertainties in extrapolating miner data to domestic exposures can be reduced by means of a broad-based health effects research programme that addresses the interrelated issues of exposure, respiratory tract dose, carcinogenesis (molecular/cellular and animal studies, plus developing biological and statistical models) and the relationship of radon to smoking and other co-pollutant exposures. This article reviews experimental animal data on radon carcinogenesis observed primarily in rats at Pacific Northwest Laboratory. Recent experimental and mechanistic carcinogenesis models of exposures to radon, uranium ore dust, and cigarette smoke are presented with statistical analyses of animal data. (author)

  12. Statistical 3D damage accumulation model for ion implant simulators

    International Nuclear Information System (INIS)

    Hernandez-Mangas, J.M.; Lazaro, J.; Enriquez, L.; Bailon, L.; Barbolla, J.; Jaraiz, M.

    2003-01-01

    A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided

  13. Statistical 3D damage accumulation model for ion implant simulators

    CERN Document Server

    Hernandez-Mangas, J M; Enriquez, L E; Bailon, L; Barbolla, J; Jaraiz, M

    2003-01-01

    A statistical 3D damage accumulation model, based on the modified Kinchin-Pease formula, for ion implant simulation has been included in our physically based ion implantation code. It has only one fitting parameter for electronic stopping and uses 3D electron density distributions for different types of targets including compound semiconductors. Also, a statistical noise reduction mechanism based on the dose division is used. The model has been adapted to be run under parallel execution in order to speed up the calculation in 3D structures. Sequential ion implantation has been modelled including previous damage profiles. It can also simulate the implantation of molecular and cluster projectiles. Comparisons of simulated doping profiles with experimental SIMS profiles are presented. Also comparisons between simulated amorphization and experimental RBS profiles are shown. An analysis of sequential versus parallel processing is provided.

  14. SoS contract verification using statistical model checking

    Directory of Open Access Journals (Sweden)

    Alessandro Mignogna

    2013-11-01

    Full Text Available Exhaustive formal verification for systems of systems (SoS is impractical and cannot be applied on a large scale. In this paper we propose to use statistical model checking for efficient verification of SoS. We address three relevant aspects for systems of systems: 1 the model of the SoS, which includes stochastic aspects; 2 the formalization of the SoS requirements in the form of contracts; 3 the tool-chain to support statistical model checking for SoS. We adapt the SMC technique for application to heterogeneous SoS. We extend the UPDM/SysML specification language to express the SoS requirements that the implemented strategies over the SoS must satisfy. The requirements are specified with a new contract language specifically designed for SoS, targeting a high-level English- pattern language, but relying on an accurate semantics given by the standard temporal logics. The contracts are verified against the UPDM/SysML specification using the Statistical Model Checker (SMC PLASMA combined with the simulation engine DESYRE, which integrates heterogeneous behavioral models through the functional mock-up interface (FMI standard. The tool-chain allows computing an estimation of the satisfiability of the contracts by the SoS. The results help the system architect to trade-off different solutions to guide the evolution of the SoS.

  15. New Approaches for Channel Prediction Based on Sinusoidal Modeling

    Directory of Open Access Journals (Sweden)

    Ekman Torbjörn

    2007-01-01

    Full Text Available Long-range channel prediction is considered to be one of the most important enabling technologies to future wireless communication systems. The prediction of Rayleigh fading channels is studied in the frame of sinusoidal modeling in this paper. A stochastic sinusoidal model to represent a Rayleigh fading channel is proposed. Three different predictors based on the statistical sinusoidal model are proposed. These methods outperform the standard linear predictor (LP in Monte Carlo simulations, but underperform with real measurement data, probably due to nonstationary model parameters. To mitigate these modeling errors, a joint moving average and sinusoidal (JMAS prediction model and the associated joint least-squares (LS predictor are proposed. It combines the sinusoidal model with an LP to handle unmodeled dynamics in the signal. The joint LS predictor outperforms all the other sinusoidal LMMSE predictors in suburban environments, but still performs slightly worse than the standard LP in urban environments.

  16. Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

    Science.gov (United States)

    Xu, Y.; Jones, A. D.; Rhoades, A.

    2017-12-01

    Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for

  17. Comparison of accuracy in predicting emotional instability from MMPI data: fisherian versus contingent probability statistics

    International Nuclear Information System (INIS)

    Berghausen, P.E. Jr.; Mathews, T.W.

    1987-01-01

    The security plans of nuclear power plants generally require that all personnel who are to have access to protected areas or vital islands be screened for emotional stability. In virtually all instances, the screening involves the administration of one or more psychological tests, usually including the Minnesota Multiphasic Personality Inventory (MMPI). At some plants, all employees receive a structured clinical interview after they have taken the MMPI and results have been obtained. At other plants, only those employees with dirty MMPI are interviewed. This latter protocol is referred to as interviews by exception. Behaviordyne Psychological Corp. has succeeded in removing some of the uncertainty associated with interview-by-exception protocols by developing an empirically based, predictive equation. This equation permits utility companies to make informed choices regarding the risks they are assuming. A conceptual problem exists with the predictive equation, however. Like most predictive equations currently in use, it is based on Fisherian statistics, involving least-squares analyses. Consequently, Behaviordyne Psychological Corp., in conjunction with T.W. Mathews and Associates, has just developed a second predictive equation, one based on contingent probability statistics. The particular technique used in the multi-contingent analysis of probability systems (MAPS) approach. The present paper presents a comparison of predictive accuracy of the two equations: the one derived using Fisherian techniques versus the one thing contingent probability techniques

  18. Chemical agnostic hazard prediction: Statistical inference of toxicity pathways - data for Figure 2

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset comprises one SigmaPlot 13 file containing measured survival data and survival data predicted from the model coefficients selected by the LASSO...

  19. Staying Power of Churn Prediction Models

    NARCIS (Netherlands)

    Risselada, Hans; Verhoef, Peter C.; Bijmolt, Tammo H. A.

    In this paper, we study the staying power of various churn prediction models. Staying power is defined as the predictive performance of a model in a number of periods after the estimation period. We examine two methods, logit models and classification trees, both with and without applying a bagging

  20. Non-Gaussianity and statistical anisotropy from vector field populated inflationary models

    CERN Document Server

    Dimastrogiovanni, Emanuela; Matarrese, Sabino; Riotto, Antonio

    2010-01-01

    We present a review of vector field models of inflation and, in particular, of the statistical anisotropy and non-Gaussianity predictions of models with SU(2) vector multiplets. Non-Abelian gauge groups introduce a richer amount of predictions compared to the Abelian ones, mostly because of the presence of vector fields self-interactions. Primordial vector fields can violate isotropy leaving their imprint in the comoving curvature fluctuations zeta at late times. We provide the analytic expressions of the correlation functions of zeta up to fourth order and an analysis of their amplitudes and shapes. The statistical anisotropy signatures expected in these models are important and, potentially, the anisotropic contributions to the bispectrum and the trispectrum can overcome the isotropic parts.

  1. Application of a Bayesian algorithm for the Statistical Energy model updating of a railway coach

    DEFF Research Database (Denmark)

    Sadri, Mehran; Brunskog, Jonas; Younesian, Davood

    2016-01-01

    The classical statistical energy analysis (SEA) theory is a common approach for vibroacoustic analysis of coupled complex structures, being efficient to predict high-frequency noise and vibration of engineering systems. There are however some limitations in applying the conventional SEA. The pres......The classical statistical energy analysis (SEA) theory is a common approach for vibroacoustic analysis of coupled complex structures, being efficient to predict high-frequency noise and vibration of engineering systems. There are however some limitations in applying the conventional SEA...... the performance of the proposed strategy, the SEA model updating of a railway passenger coach is carried out. First, a sensitivity analysis is carried out to select the most sensitive parameters of the SEA model. For the selected parameters of the model, prior probability density functions are then taken...

  2. Combining statistical techniques to predict postsurgical risk of 1-year mortality for patients with colon cancer

    Directory of Open Access Journals (Sweden)

    Arostegui I

    2018-03-01

    Full Text Available Inmaculada Arostegui,1–3 Nerea Gonzalez,2,4 Nerea Fernández-de-Larrea,5,6 Santiago Lázaro-Aramburu,7 Marisa Baré,2,8 Maximino Redondo,2,9 Cristina Sarasqueta,2,10 Susana Garcia-Gutierrez,2,4 José M Quintana2,4 On behalf of the REDISSEC CARESS-CCR Group2 1Department of Applied Mathematics, Statistics and Operations Research, University of the Basque Country UPV/EHU, Leioa, Bizkaia, Spain; 2Health Services Research on Chronic Patients Network (REDISSEC, Galdakao, Bizkaia, Spain; 3Basque Center for Applied Mathematics – BCAM, Bilbao, Bizkaia, Spain; 4Research Unit, Galdakao-Usansolo Hospital, Galdakao, Bizkaia, Spain; 5Environmental and Cancer Epidemiology Unit, National Center of Epidemiology, Instituto de Salud Carlos III, Madrid, Spain; 6Consortium for Biomedical Research in Epidemiology and Public Health (CIBERESP, Madrid, Spain; 7General Surgery Service, Galdakao-Usansolo Hospital, Galdakao, Bizkaia, Spain; 8Clinical Epidemiology and Cancer Screening Unit, Parc Taulí Sabadell-Hospital Universitari, UAB, Sabadell, Barcelona, Spain; 9Research Unit, Costa del Sol Hospital, Marbella, Malaga, Spain; 10Research Unit, Donostia Hospital, Donostia-San Sebastián, Gipuzkoa, Spain Introduction: Colorectal cancer is one of the most frequently diagnosed malignancies and a common cause of cancer-related mortality. The aim of this study was to develop and validate a clinical predictive model for 1-year mortality among patients with colon cancer who survive for at least 30 days after surgery. Methods: Patients diagnosed with colon cancer who had surgery for the first time and who survived 30 days after the surgery were selected prospectively. The outcome was mortality within 1 year. Random forest, genetic algorithms and classification and regression trees were combined in order to identify the variables and partition points that optimally classify patients by risk of mortality. The resulting decision tree was categorized into four risk categories

  3. Nuclear EMC effect in non-extensive statistical model

    Energy Technology Data Exchange (ETDEWEB)

    Trevisan, Luis A. [Departamento de Matematica e Estatistica, Universidade Estadual de Ponta Grossa, 84010-790, Ponta Grossa, PR (Brazil); Mirez, Carlos [ICET, Universidade Federal dos Vales do Jequitinhonha e Mucuri - UFVJM, Campus do Mucuri, Rua do Cruzeiro 01, Jardim Sao Paulo, 39803-371, Teofilo Otoni, MG (Brazil)

    2013-05-06

    In the present work, we attempt to describe the nuclear EMC effect by using the proton structure functions obtained from the non-extensive statistical quark model. We record that such model has three fundamental variables, the temperature T, the radius, and the Tsallis parameter q. By combining different small changes, a good agreement with the experimental data may be obtained. Another interesting point of the model is to allow phenomenological interpretation, for instance, with q constant and changing the radius and the temperature or changing the radius and q and keeping the temperature.

  4. A Census of Statistics Requirements at U.S. Journalism Programs and a Model for a "Statistics for Journalism" Course

    Science.gov (United States)

    Martin, Justin D.

    2017-01-01

    This essay presents data from a census of statistics requirements and offerings at all 4-year journalism programs in the United States (N = 369) and proposes a model of a potential course in statistics for journalism majors. The author proposes that three philosophies underlie a statistics course for journalism students. Such a course should (a)…

  5. In silico environmental chemical science: properties and processes from statistical and computational modelling

    Energy Technology Data Exchange (ETDEWEB)

    Tratnyek, P. G.; Bylaska, Eric J.; Weber, Eric J.

    2017-01-01

    Quantitative structure–activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with “in silico” results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs using descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for “in silico environmental chemical science” are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.

  6. In silico environmental chemical science: properties and processes from statistical and computational modelling.

    Science.gov (United States)

    Tratnyek, Paul G; Bylaska, Eric J; Weber, Eric J

    2017-03-22

    Quantitative structure-activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with "in silico" results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs using descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for "in silico environmental chemical science" are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.

  7. Computational modeling of oligonucleotide positional densities for human promoter prediction.

    Science.gov (United States)

    Narang, Vipin; Sung, Wing-Kin; Mittal, Ankush

    2005-01-01

    The gene promoter region controls transcriptional initiation of a gene, which is the most important step in gene regulation. In-silico detection of promoter region in genomic sequences has a number of applications in gene discovery and understanding gene expression regulation. However, computational prediction of eukaryotic poly-II promoters has remained a difficult task. This paper introduces a novel statistical technique for detecting promoter regions in long genomic sequences. A number of existing techniques analyze the occurrence frequencies of oligonucleotides in promoter sequences as compared to other genomic regions. In contrast, the present work studies the positional densities of oligonucleotides in promoter sequences. The analysis does not require any non-promoter sequence dataset or any model of the background oligonucleotide content of the genome. The statistical model learnt from a dataset of promoter sequences automatically recognizes a number of transcription factor binding sites simultaneously with their occurrence positions relative to the transcription start site. Based on this model, a continuous naïve Bayes classifier is developed for the detection of human promoters and transcription start sites in genomic sequences. The present study extends the scope of statistical models in general promoter modeling and prediction. Promoter sequence features learnt by the model correlate well with known biological facts. Results of human transcription start site prediction compare favorably with existing 2nd generation promoter prediction tools.

  8. 4K Video Traffic Prediction using Seasonal Autoregressive Modeling

    Directory of Open Access Journals (Sweden)

    D. R. Marković

    2017-06-01

    Full Text Available From the perspective of average viewer, high definition video streams such as HD (High Definition and UHD (Ultra HD are increasing their internet presence year over year. This is not surprising, having in mind expansion of HD streaming services, such as YouTube, Netflix etc. Therefore, high definition video streams are starting to challenge network resource allocation with their bandwidth requirements and statistical characteristics. Need for analysis and modeling of this demanding video traffic has essential importance for better quality of service and experience support. In this paper we use an easy-to-apply statistical model for prediction of 4K video traffic. Namely, seasonal autoregressive modeling is applied in prediction of 4K video traffic, encoded with HEVC (High Efficiency Video Coding. Analysis and modeling were performed within R programming environment using over 17.000 high definition video frames. It is shown that the proposed methodology provides good accuracy in high definition video traffic modeling.

  9. A statistical mechanics model for free-for-all airplane passenger boarding

    Energy Technology Data Exchange (ETDEWEB)

    Steffen, Jason H.; /Fermilab

    2008-08-01

    I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. The model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty.

  10. A statistical mechanics model for free-for-all airplane passenger boarding

    Science.gov (United States)

    Steffen, Jason H.

    2008-12-01

    I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics, where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. The model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty.

  11. Statistical modelling of a new global potential vegetation distribution

    Science.gov (United States)

    Levavasseur, G.; Vrac, M.; Roche, D. M.; Paillard, D.

    2012-12-01

    The potential natural vegetation (PNV) distribution is required for several studies in environmental sciences. Most of the available databases are quite subjective or depend on vegetation models. We have built a new high-resolution world-wide PNV map using a objective statistical methodology based on multinomial logistic models. Our method appears as a fast and robust alternative in vegetation modelling, independent of any vegetation model. In comparison with other databases, our method provides a realistic PNV distribution in agreement with respect to BIOME 6000 data. Among several advantages, the use of probabilities allows us to estimate the uncertainty, bringing some confidence in the modelled PNV, or to highlight the regions needing some data to improve the PNV modelling. Despite our PNV map being highly dependent on the distribution of data points, it is easily updatable as soon as additional data are available and provides very useful additional information for further applications.

  12. Comparing National Water Model Inundation Predictions with Hydrodynamic Modeling

    Science.gov (United States)

    Egbert, R. J.; Shastry, A.; Aristizabal, F.; Luo, C.

    2017-12-01

    The National Water Model (NWM) simulates the hydrologic cycle and produces streamflow forecasts, runoff, and other variables for 2.7 million reaches along the National Hydrography Dataset for the continental United States. NWM applies Muskingum-Cunge channel routing which is based on the continuity equation. However, the momentum equation also needs to be considered to obtain better estimates of streamflow and stage in rivers especially for applications such as flood inundation mapping. Simulation Program for River NeTworks (SPRNT) is a fully dynamic model for large scale river networks that solves the full nonlinear Saint-Venant equations for 1D flow and stage height in river channel networks with non-uniform bathymetry. For the current work, the steady-state version of the SPRNT model was leveraged. An evaluation on SPRNT's and NWM's abilities to predict inundation was conducted for the record flood of Hurricane Matthew in October 2016 along the Neuse River in North Carolina. This event was known to have been influenced by backwater effects from the Hurricane's storm surge. Retrospective NWM discharge predictions were converted to stage using synthetic rating curves. The stages from both models were utilized to produce flood inundation maps using the Height Above Nearest Drainage (HAND) method which uses the local relative heights to provide a spatial representation of inundation depths. In order to validate the inundation produced by the models, Sentinel-1A synthetic aperture radar data in the VV and VH polarizations along with auxiliary data was used to produce a reference inundation map. A preliminary, binary comparison of the inundation maps to the reference, limited to the five HUC-12 areas of Goldsboro, NC, yielded that the flood inundation accuracies for NWM and SPRNT were 74.68% and 78.37%, respectively. The differences for all the relevant test statistics including accuracy, true positive rate, true negative rate, and positive predictive value were found

  13. Bayesian statistic methods and theri application in probabilistic simulation models

    Directory of Open Access Journals (Sweden)

    Sergio Iannazzo

    2007-03-01

    Full Text Available Bayesian statistic methods are facing a rapidly growing level of interest and acceptance in the field of health economics. The reasons of this success are probably to be found on the theoretical fundaments of the discipline that make these techniques more appealing to decision analysis. To this point should be added the modern IT progress that has developed different flexible and powerful statistical software framework. Among them probably one of the most noticeably is the BUGS language project and its standalone application for MS Windows WinBUGS. Scope of this paper is to introduce the subject and to show some interesting applications of WinBUGS in developing complex economical models based on Markov chains. The advantages of this approach reside on the elegance of the code produced and in its capability to easily develop probabilistic simulations. Moreover an example of the integration of bayesian inference models in a Markov model is shown. This last feature let the analyst conduce statistical analyses on the available sources of evidence and exploit them directly as inputs in the economic model.

  14. Conditioning model output statistics of regional climate model precipitation on circulation patterns

    Directory of Open Access Journals (Sweden)

    F. Wetterhall

    2012-11-01

    Full Text Available Dynamical downscaling of Global Climate Models (GCMs through regional climate models (RCMs potentially improves the usability of the output for hydrological impact studies. However, a further downscaling or interpolation of precipitation from RCMs is often needed to match the precipitation characteristics at the local scale. This study analysed three Model Output Statistics (MOS techniques to adjust RCM precipitation; (1 a simple direct method (DM, (2 quantile-quantile mapping (QM and (3 a distribution-based scaling (DBS approach. The modelled precipitation was daily means from 16 RCMs driven by ERA40 reanalysis data over the 1961–2000 provided by the ENSEMBLES (ENSEMBLE-based Predictions of Climate Changes and their Impacts project over a small catchment located in the Midlands, UK. All methods were conditioned on the entire time series, separate months and using an objective classification of Lamb's weather types. The performance of the MOS techniques were assessed regarding temporal and spatial characteristics of the precipitation fields, as well as modelled runoff using the HBV rainfall-runoff model. The results indicate that the DBS conditioned on classification patterns performed better than the other methods, however an ensemble approach in terms of both climate models and downscaling methods is recommended to account for uncertainties in the MOS methods.

  15. Preoperative prediction model of outcome after cholecystectomy for symptomatic gallstones

    DEFF Research Database (Denmark)

    Borly, L; Anderson, I B; Bardram, Linda

    1999-01-01

    and sonography evaluated gallbladder motility, gallstones, and gallbladder volume. Preoperative variables in patients with or without postcholecystectomy pain were compared statistically, and significant variables were combined in a logistic regression model to predict the postoperative outcome. RESULTS: Eighty...... and by the absence of 'agonizing' pain and of symptoms coinciding with pain (P model 15 of 18 predicted patients had postoperative pain (PVpos = 0.83). Of 62 patients predicted as having no pain postoperatively, 56 were pain-free (PVneg = 0.90). Overall accuracy...... was 89%. CONCLUSION: From this prospective study a model based on preoperative symptoms was developed to predict postcholecystectomy pain. Since intrastudy reclassification may give too optimistic results, the model should be validated in future studies....

  16. Modelling West African Total Precipitation Depth: A Statistical Approach

    Directory of Open Access Journals (Sweden)

    S. Sovoe

    2015-09-01

    Full Text Available Even though several reports over the past few decades indicate an increasing aridity over West Africa, attempts to establish the controlling factor(s have not been successful. The traditional belief of the position of the Inter-tropical Convergence Zone (ITCZ as the predominant factor over the region has been refuted by recent findings. Changes in major atmospheric circulations such as African Easterly Jet (AEJ and Tropical Easterly Jet (TEJ are being cited as major precipitation driving forces over the region. Thus, any attempt to predict long term precipitation events over the region using Global Circulation or Local Circulation Models could be flawed as the controlling factors are not fully elucidated yet. Successful prediction effort may require models which depend on past events as their inputs as in the case of time series models such as Autoregressive Integrated Moving Average (ARIMA model. In this study, historical precipitation data was imported as time series data structure into an R programming language and was used to build appropriate Seasonal Multiplicative Autoregressive Integrated Moving Average model, ARIMA (p, d, q*(P, D, Q. The model was then used to predict long term precipitation events over the Ghanaian segment of the Volta Basin which could be used in planning and implementation of development policies.

  17. Statistical volumetric model for characterization and visualization of prostate cancer

    Science.gov (United States)

    Lu, Jianping; Srikanchana, Rujirutana; McClain, Maxine A.; Wang, Yue J.; Xuan, Jian Hua; Sesterhenn, Isabell A.; Freedman, Matthew T.; Mun, Seong K.

    2000-04-01

    To reveal the spatial pattern of localized prostate cancer distribution, a 3D statistical volumetric model, showing the probability map of prostate cancer distribution, together with the anatomical structure of the prostate, has been developed from 90 digitally-imaged surgical specimens. Through an enhanced virtual environment with various visualization modes, this master model permits for the first time an accurate characterization and understanding of prostate cancer distribution patterns. The construction of the statistical volumetric model is characterized by mapping all of the individual models onto a generic prostate site model, in which a self-organizing scheme is used to decompose a group of contours representing multifold tumors into localized tumor elements. Next crucial step of creating the master model is the development of an accurate multi- object and non-rigid registration/warping scheme incorporating various variations among these individual moles in true 3D. This is achieved with a multi-object based principle-axis alignment followed by an affine transform, and further fine-tuned by a thin-plate spline interpolation driven by the surface based deformable warping dynamics. Based on the accurately mapped tumor distribution, a standard finite normal mixture is used to model the cancer volumetric distribution statistics, whose parameters are estimated using both the K-means and expectation- maximization algorithms under the information theoretic criteria. Given the desired number of tissue samplings, the prostate needle biopsy site selection is optimized through a probabilistic self-organizing map thus achieving a maximum likelihood of cancer detection. We describe the details of our theory and methodology, and report our pilot results and evaluation of the effectiveness of the algorithm in characterizing prostate cancer distributions and optimizing needle biopsy techniques.

  18. Spin studies of nucleons in a statistical model

    International Nuclear Information System (INIS)

    Singh, J P; Upadhyay, Alka

    2004-01-01

    We decompose various quark-gluon Fock states of a nucleon in a set of states in which each of the three-quark core and the rest of the stuff, termed as sea, appears with definite spin and colour quantum number, their weights being determined, statistically, from their multiplicities. The expansion coefficients in the quark-gluon Fock state expansion have been taken from a recently proposed statistical model. We have also considered two modifications of this model with a view to reducing the contributions of the sea components with higher multiplicities. With certain approximations, we have calculated the quark contributions to the spin of the nucleon, the ratio of the magnetic moments of nucleons, their weak decay constant and the ratio of SU(3) reduced matrix elements for the axial current

  19. Statistical inference to advance network models in epidemiology.

    Science.gov (United States)

    Welch, David; Bansal, Shweta; Hunter, David R

    2011-03-01

    Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.

  20. A statistical method for descriminating between alternative radiobiological models

    International Nuclear Information System (INIS)

    Kinsella, I.A.; Malone, J.F.

    1977-01-01

    Radiobiological models assist understanding of the development of radiation damage, and may provide a basis for extrapolating dose-effect curves from high to low dose regions. Many models have been proposed such as multitarget and its modifications, enzymatic models, and those with a quadratic dose response relationship (i.e. αD + βD 2 forms). It is difficult to distinguish between these because the statistical techniques used are almost always limited, in that one method can rarely be applied to the whole range of models. A general statistical procedure for parameter estimation (Maximum Liklihood Method) has been found applicable to a wide range of radiobiological models. The curve parameters are estimated using a computerised search that continues until the most likely set of values to fit the data is obtained. When the search is complete two procedures are carried out. First a goodness of fit test is applied which examines the applicability of an individual model to the data. Secondly an index is derived which provides an indication of the adequacy of any model compared with alternative models. Thus the models may be ranked according to how well they fit the data. For example, with one set of data, multitarget types were found to be more suitable than quadratic types (αD + βD 2 ). This method should be of assitance is evaluating various models. It may also be profitably applied to selection of the most appropriate model to use, when it is necessary to extrapolate from high to low doses