WorldWideScience

Sample records for model predictions statistical

  1. A statistical model for predicting muscle performance

    Science.gov (United States)

    Byerly, Diane Leslie De Caix

    The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing

  2. Risk prediction model: Statistical and artificial neural network approach

    Science.gov (United States)

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  3. A neighborhood statistics model for predicting stream pathogen indicator levels.

    Science.gov (United States)

    Pandey, Pramod K; Pasternack, Gregory B; Majumder, Mahbubul; Soupir, Michelle L; Kaiser, Mark S

    2015-03-01

    Because elevated levels of water-borne Escherichia coli in streams are a leading cause of water quality impairments in the U.S., water-quality managers need tools for predicting aqueous E. coli levels. Presently, E. coli levels may be predicted using complex mechanistic models that have a high degree of unchecked uncertainty or simpler statistical models. To assess spatio-temporal patterns of instream E. coli levels, herein we measured E. coli, a pathogen indicator, at 16 sites (at four different times) within the Squaw Creek watershed, Iowa, and subsequently, the Markov Random Field model was exploited to develop a neighborhood statistics model for predicting instream E. coli levels. Two observed covariates, local water temperature (degrees Celsius) and mean cross-sectional depth (meters), were used as inputs to the model. Predictions of E. coli levels in the water column were compared with independent observational data collected from 16 in-stream locations. The results revealed that spatio-temporal averages of predicted and observed E. coli levels were extremely close. Approximately 66 % of individual predicted E. coli concentrations were within a factor of 2 of the observed values. In only one event, the difference between prediction and observation was beyond one order of magnitude. The mean of all predicted values at 16 locations was approximately 1 % higher than the mean of the observed values. The approach presented here will be useful while assessing instream contaminations such as pathogen/pathogen indicator levels at the watershed scale.

  4. Model output statistics applied to wind power prediction

    Energy Technology Data Exchange (ETDEWEB)

    Joensen, A; Giebel, G; Landberg, L [Risoe National Lab., Roskilde (Denmark); Madsen, H; Nielsen, H A [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)

    1999-03-01

    Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.

  5. Statistical models for expert judgement and wear prediction

    International Nuclear Information System (INIS)

    Pulkkinen, U.

    1994-01-01

    This thesis studies the statistical analysis of expert judgements and prediction of wear. The point of view adopted is the one of information theory and Bayesian statistics. A general Bayesian framework for analyzing both the expert judgements and wear prediction is presented. Information theoretic interpretations are given for some averaging techniques used in the determination of consensus distributions. Further, information theoretic models are compared with a Bayesian model. The general Bayesian framework is then applied in analyzing expert judgements based on ordinal comparisons. In this context, the value of information lost in the ordinal comparison process is analyzed by applying decision theoretic concepts. As a generalization of the Bayesian framework, stochastic filtering models for wear prediction are formulated. These models utilize the information from condition monitoring measurements in updating the residual life distribution of mechanical components. Finally, the application of stochastic control models in optimizing operational strategies for inspected components are studied. Monte-Carlo simulation methods, such as the Gibbs sampler and the stochastic quasi-gradient method, are applied in the determination of posterior distributions and in the solution of stochastic optimization problems. (orig.) (57 refs., 7 figs., 1 tab.)

  6. Estimating Predictive Variance for Statistical Gas Distribution Modelling

    International Nuclear Information System (INIS)

    Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo

    2009-01-01

    Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.

  7. Prediction of dimethyl disulfide levels from biosolids using statistical modeling.

    Science.gov (United States)

    Gabriel, Steven A; Vilalai, Sirapong; Arispe, Susanna; Kim, Hyunook; McConnell, Laura L; Torrents, Alba; Peot, Christopher; Ramirez, Mark

    2005-01-01

    Two statistical models were used to predict the concentration of dimethyl disulfide (DMDS) released from biosolids produced by an advanced wastewater treatment plant (WWTP) located in Washington, DC, USA. The plant concentrates sludge from primary sedimentation basins in gravity thickeners (GT) and sludge from secondary sedimentation basins in dissolved air flotation (DAF) thickeners. The thickened sludge is pumped into blending tanks and then fed into centrifuges for dewatering. The dewatered sludge is then conditioned with lime before trucking out from the plant. DMDS, along with other volatile sulfur and nitrogen-containing chemicals, is known to contribute to biosolids odors. These models identified oxidation/reduction potential (ORP) values of a GT and DAF, the amount of sludge dewatered by centrifuges, and the blend ratio between GT thickened sludge and DAF thickened sludge in blending tanks as control variables. The accuracy of the developed regression models was evaluated by checking the adjusted R2 of the regression as well as the signs of coefficients associated with each variable. In general, both models explained observed DMDS levels in sludge headspace samples. The adjusted R2 value of the regression models 1 and 2 were 0.79 and 0.77, respectively. Coefficients for each regression model also had the correct sign. Using the developed models, plant operators can adjust the controllable variables to proactively decrease this odorant. Therefore, these models are a useful tool in biosolids management at WWTPs.

  8. Statistical and Machine Learning Models to Predict Programming Performance

    OpenAIRE

    Bergin, Susan

    2006-01-01

    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant...

  9. Statistical model based gender prediction for targeted NGS clinical panels

    Directory of Open Access Journals (Sweden)

    Palani Kannan Kandavel

    2017-12-01

    The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.

  10. Monthly to seasonal low flow prediction: statistical versus dynamical models

    Science.gov (United States)

    Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke

    2016-04-01

    the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.

  11. Statistical models to predict flows at monthly level in Salvajina

    International Nuclear Information System (INIS)

    Gonzalez, Harold O

    1994-01-01

    It thinks about and models of lineal regression evaluate at monthly level that they allow to predict flows in Salvajina, with base in predictions variable, like the difference of pressure between Darwin and Tahiti, precipitation in Piendamo Cauca), temperature in Port Chicama (Peru) and pressure in Tahiti

  12. Multivariate statistical models for disruption prediction at ASDEX Upgrade

    International Nuclear Information System (INIS)

    Aledda, R.; Cannas, B.; Fanni, A.; Sias, G.; Pautasso, G.

    2013-01-01

    In this paper, a disruption prediction system for ASDEX Upgrade has been proposed that does not require disruption terminated experiments to be implemented. The system consists of a data-based model, which is built using only few input signals coming from successfully terminated pulses. A fault detection and isolation approach has been used, where the prediction is based on the analysis of the residuals of an auto regressive exogenous input model. The prediction performance of the proposed system is encouraging when it is applied to the same set of campaigns used to implement the model. However, the false alarms significantly increase when we tested the system on discharges coming from experimental campaigns temporally far from those used to train the model. This is due to the well know aging effect inherent in the data-based models. The main advantage of the proposed method, with respect to other data-based approaches in literature, is that it does not need data on experiments terminated with a disruption, as it uses a normal operating conditions model. This is a big advantage in the prospective of a prediction system for ITER, where a limited number of disruptions can be allowed

  13. A statistical model for aggregating judgments by incorporating peer predictions

    OpenAIRE

    McCoy, John; Prelec, Drazen

    2017-01-01

    We propose a probabilistic model to aggregate the answers of respondents answering multiple-choice questions. The model does not assume that everyone has access to the same information, and so does not assume that the consensus answer is correct. Instead, it infers the most probable world state, even if only a minority vote for it. Each respondent is modeled as receiving a signal contingent on the actual world state, and as using this signal to both determine their own answer and predict the ...

  14. Prediction of Frost Occurrences Using Statistical Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Hyojin Lee

    2016-01-01

    Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.

  15. Statistical Models for Predicting Threat Detection From Human Behavior

    Science.gov (United States)

    Kelley, Timothy; Amon, Mary J.; Bertenthal, Bennett I.

    2018-01-01

    Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing

  16. Statistical Models for Predicting Threat Detection From Human Behavior

    Directory of Open Access Journals (Sweden)

    Timothy Kelley

    2018-04-01

    Full Text Available Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption. Spoof websites had modified Uniform Resource Locator (URL and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level, survey-based (i.e., security knowledge and website familiarity, and real-time measures (i.e., mouse tracking in predicting risky online behavior

  17. Statistical Models for Predicting Threat Detection From Human Behavior.

    Science.gov (United States)

    Kelley, Timothy; Amon, Mary J; Bertenthal, Bennett I

    2018-01-01

    Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure "non-spoof" or insecure "spoof" versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to "login" to or "back" out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing attacks

  18. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models

    NARCIS (Netherlands)

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.

    2012-01-01

    PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator

  19. Prediction of lacking control power in power plants using statistical models

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Mataji, B.; Stoustrup, Jakob

    2007-01-01

    Prediction of the performance of plants like power plants is of interest, since the plant operator can use these predictions to optimize the plant production. In this paper the focus is addressed on a special case where a combination of high coal moisture content and a high load limits the possible...... plant load, meaning that the requested plant load cannot be met. The available models are in this case uncertain. Instead statistical methods are used to predict upper and lower uncertainty bounds on the prediction. Two different methods are used. The first relies on statistics of recent prediction...... errors; the second uses operating point depending statistics of prediction errors. Using these methods on the previous mentioned case, it can be concluded that the second method can be used to predict the power plant performance, while the first method has problems predicting the uncertain performance...

  20. Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.

    Science.gov (United States)

    Wang, Ming; Long, Qi

    2016-09-01

    Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.

  1. Statistical model for prediction of hearing loss in patients receiving cisplatin chemotherapy.

    Science.gov (United States)

    Johnson, Andrew; Tarima, Sergey; Wong, Stuart; Friedland, David R; Runge, Christina L

    2013-03-01

    This statistical model might be used to predict cisplatin-induced hearing loss, particularly in patients undergoing concomitant radiotherapy. To create a statistical model based on pretreatment hearing thresholds to provide an individual probability for hearing loss from cisplatin therapy and, secondarily, to investigate the use of hearing classification schemes as predictive tools for hearing loss. Retrospective case-control study. Tertiary care medical center. A total of 112 subjects receiving chemotherapy and audiometric evaluation were evaluated for the study. Of these subjects, 31 met inclusion criteria for analysis. The primary outcome measurement was a statistical model providing the probability of hearing loss following the use of cisplatin chemotherapy. Fifteen of the 31 subjects had significant hearing loss following cisplatin chemotherapy. American Academy of Otolaryngology-Head and Neck Society and Gardner-Robertson hearing classification schemes revealed little change in hearing grades between pretreatment and posttreatment evaluations for subjects with or without hearing loss. The Chang hearing classification scheme could effectively be used as a predictive tool in determining hearing loss with a sensitivity of 73.33%. Pretreatment hearing thresholds were used to generate a statistical model, based on quadratic approximation, to predict hearing loss (C statistic = 0.842, cross-validated = 0.835). The validity of the model improved when only subjects who received concurrent head and neck irradiation were included in the analysis (C statistic = 0.91). A calculated cutoff of 0.45 for predicted probability has a cross-validated sensitivity and specificity of 80%. Pretreatment hearing thresholds can be used as a predictive tool for cisplatin-induced hearing loss, particularly with concomitant radiotherapy.

  2. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    Science.gov (United States)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with

  3. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    Science.gov (United States)

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Statistical model predictions for p+p and Pb+Pb collisions at LHC

    NARCIS (Netherlands)

    Kraus, I.; Cleymans, J.; Oeschler, H.; Redlich, K.; Wheaton, S.

    2009-01-01

    Particle production in p+p and central collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in

  5. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    Science.gov (United States)

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  6. Output from Statistical Predictive Models as Input to eLearning Dashboards

    Directory of Open Access Journals (Sweden)

    Marlene A. Smith

    2015-06-01

    Full Text Available We describe how statistical predictive models might play an expanded role in educational analytics by giving students automated, real-time information about what their current performance means for eventual success in eLearning environments. We discuss how an online messaging system might tailor information to individual students using predictive analytics. The proposed system would be data-driven and quantitative; e.g., a message might furnish the probability that a student will successfully complete the certificate requirements of a massive open online course. Repeated messages would prod underperforming students and alert instructors to those in need of intervention. Administrators responsible for accreditation or outcomes assessment would have ready documentation of learning outcomes and actions taken to address unsatisfactory student performance. The article’s brief introduction to statistical predictive models sets the stage for a description of the messaging system. Resources and methods needed to develop and implement the system are discussed.

  7. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    Energy Technology Data Exchange (ETDEWEB)

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)

    2012-03-15

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  8. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

    Science.gov (United States)

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A

    2012-03-15

    To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    International Nuclear Information System (INIS)

    Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t

    2012-01-01

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  10. Statistical Model Predictions for p+p and Pb+Pb Collisions at LHC

    CERN Document Server

    Kraus, I; Oeschler, H; Redlich, K; Wheaton, S

    2009-01-01

    Particle production in p+p and central Pb+Pb collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in detail, and some of them, which are particularly appropriate to determine the chemical freeze-out point experimentally, are indicated. Considering elementary interactions on the other hand, we focus on strangeness production and its possible suppression. Extrapolating the thermal parameters to LHC energy, we present predictions of the statistical model for particle yields in p+p collisions. We quantify the strangeness suppression by the correlation volume parameter and discuss its influence on particle production. We propose observables that can provide deeper insight into the mechanism of strangeness production and suppression at LHC.

  11. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Directory of Open Access Journals (Sweden)

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  12. Predicting The Exit Time Of Employees In An Organization Using Statistical Model

    Directory of Open Access Journals (Sweden)

    Ahmed Al Kuwaiti

    2015-08-01

    Full Text Available Employees are considered as an asset to any organization and each organization provide a better and flexible working environment to retain its best and resourceful workforce. As such continuous efforts are being taken to avoid or extend the exitwithdrawal of employees from the organization. Human resource managers are facing a challenge to predict the exit time of employees and there is no precise model existing at present in the literature. This study has been conducted to predict the probability of exit of an employee in an organization using appropriate statistical model. Accordingly authors designed a model using Additive Weibull distribution to predict the expected exit time of employee in an organization. In addition a Shock model approach is also executed to check how well the Additive Weibull distribution suits in an organization. The analytical results showed that when the inter-arrival time increases the expected time for the employees to exit also increases. This study concluded that Additive Weibull distribution can be considered as an alternative in the place of Shock model approach to predict the exit time of employee in an organization.

  13. Flow prediction models using macroclimatic variables and multivariate statistical techniques in the Cauca River Valley

    International Nuclear Information System (INIS)

    Carvajal Escobar Yesid; Munoz, Flor Matilde

    2007-01-01

    The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.

  14. Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles

    Science.gov (United States)

    Mini, C.; Hogue, T. S.; Pincetl, S.

    2012-04-01

    Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A

  15. Explained variation and predictive accuracy in general parametric statistical models: the role of model misspecification

    DEFF Research Database (Denmark)

    Rosthøj, Susanne; Keiding, Niels

    2004-01-01

    When studying a regression model measures of explained variation are used to assess the degree to which the covariates determine the outcome of interest. Measures of predictive accuracy are used to assess the accuracy of the predictions based on the covariates and the regression model. We give a ...... a detailed and general introduction to the two measures and the estimation procedures. The framework we set up allows for a study of the effect of misspecification on the quantities estimated. We also introduce a generalization to survival analysis....

  16. Wind gust estimation by combining numerical weather prediction model and statistical post-processing

    Science.gov (United States)

    Patlakas, Platon; Drakaki, Eleni; Galanis, George; Spyrou, Christos; Kallos, George

    2017-04-01

    The continuous rise of off-shore and near-shore activities as well as the development of structures, such as wind farms and various offshore platforms, requires the employment of state-of-the-art risk assessment techniques. Such analysis is used to set the safety standards and can be characterized as a climatologically oriented approach. Nevertheless, a reliable operational support is also needed in order to minimize cost drawbacks and human danger during the construction and the functioning stage as well as during maintenance activities. One of the most important parameters for this kind of analysis is the wind speed intensity and variability. A critical measure associated with this variability is the presence and magnitude of wind gusts as estimated in the reference level of 10m. The latter can be attributed to different processes that vary among boundary-layer turbulence, convection activities, mountain waves and wake phenomena. The purpose of this work is the development of a wind gust forecasting methodology combining a Numerical Weather Prediction model and a dynamical statistical tool based on Kalman filtering. To this end, the parameterization of Wind Gust Estimate method was implemented to function within the framework of the atmospheric model SKIRON/Dust. The new modeling tool combines the atmospheric model with a statistical local adaptation methodology based on Kalman filters. This has been tested over the offshore west coastline of the United States. The main purpose is to provide a useful tool for wind analysis and prediction and applications related to offshore wind energy (power prediction, operation and maintenance). The results have been evaluated by using observational data from the NOAA's buoy network. As it was found, the predicted output shows a good behavior that is further improved after the local adjustment post-process.

  17. Predicting tube repair at French nuclear steam generators using statistical modeling

    Energy Technology Data Exchange (ETDEWEB)

    Mathon, C., E-mail: cedric.mathon@edf.fr [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Chaudhary, A. [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Gay, N.; Pitner, P. [EDF Generation, Nuclear Operation Division (UNIE), Saint-Denis (France)

    2014-04-01

    Electricité de France (EDF) currently operates a total of 58 Nuclear Pressurized Water Reactors (PWR) which are composed of 34 units of 900 MWe, 20 units of 1300 MWe and 4 units of 1450 MWe. This report provides an overall status of SG tube bundles on the 1300 MWe units. These units are 4 loop reactors using the AREVA 68/19 type SG model which are equipped either with Alloy 600 thermally treated (TT) tubes or Alloy 690 TT tubes. As of 2011, the effective full power years of operation (EFPY) ranges from 13 to 20 and during this time, the main degradation mechanisms observed on SG tubes are primary water stress corrosion cracking (PWSCC) and wear at anti-vibration bars (AVB) level. Statistical models have been developed for each type of degradation in order to predict the growth rate and number of affected tubes. Additional plugging is also performed to prevent other degradations such as tube wear due to foreign objects or high-cycle flow-induced fatigue. The contribution of these degradation mechanisms on the rate of tube plugging is described. The results from the statistical models are then used in predicting the long-term life of the steam generators and therefore providing a useful tool toward their effective life management and possible replacement.

  18. Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index

    Science.gov (United States)

    Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla

    2017-10-01

    This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.

  19. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    Science.gov (United States)

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  20. A RANS knock model to predict the statistical occurrence of engine knock

    International Nuclear Information System (INIS)

    D'Adamo, Alessandro; Breda, Sebastiano; Fontanesi, Stefano; Irimescu, Adrian; Merola, Simona Silvia; Tornatore, Cinzia

    2017-01-01

    Highlights: • Development of a new RANS model for SI engine knock probability. • Turbulence-derived transport equations for variances of mixture fraction and enthalpy. • Gasoline autoignition delay times calculated from detailed chemical kinetics. • Knock probability validated against experiments on optically accessible GDI unit. • PDF-based knock model accounting for the random nature of SI engine knock in RANS simulations. - Abstract: In the recent past engine knock emerged as one of the main limiting aspects for the achievement of higher efficiency targets in modern spark-ignition (SI) engines. To attain these requirements, engine operating points must be moved as close as possible to the onset of abnormal combustions, although the turbulent nature of flow field and SI combustion leads to possibly ample fluctuations between consecutive engine cycles. This forces engine designers to distance the target condition from its theoretical optimum in order to prevent abnormal combustion, which can potentially damage engine components because of few individual heavy-knocking cycles. A statistically based RANS knock model is presented in this study, whose aim is the prediction not only of the ensemble average knock occurrence, poorly meaningful in such a stochastic event, but also of a knock probability. The model is based on look-up tables of autoignition times from detailed chemistry, coupled with transport equations for the variance of mixture fraction and enthalpy. The transported perturbations around the ensemble average value are based on variable gradients and on a local turbulent time scale. A multi-variate cell-based Gaussian-PDF model is proposed for the unburnt mixture, resulting in a statistical distribution for the in-cell reaction rate. An average knock precursor and its variance are independently calculated and transported; this results in the prediction of an earliest knock probability preceding the ensemble average knock onset, as confirmed by

  1. Prediction of thrombophilia in patients with unexplained recurrent pregnancy loss using a statistical model.

    Science.gov (United States)

    Wang, Tongfei; Kang, Xiaomin; He, Liying; Liu, Zhilan; Xu, Haijing; Zhao, Aimin

    2017-09-01

    To establish a statistical model to predict thrombophilia in patients with unexplained recurrent pregnancy loss (URPL). A retrospective case-control study was conducted at Ren Ji Hospital, Shanghai, China, from March 2014 to October 2016. The levels of D-dimer (DD), fibrinogen degradation products (FDP), activated partial thromboplastin time (APTT), prothrombin time (PT), thrombin time (TT), fibrinogen (Fg), and platelet aggregation in response to arachidonic acid (AA) and adenosine diphosphate (ADP) were collected. Receiver operating characteristic curve analysis was used to analyze data from 158 UPRL patients (≥3 previous first trimester pregnancy losses with unexplained etiology) and 131 non-RPL patients (no history of recurrent pregnancy loss). A logistic regression model (LRM) was built and the model was externally validated in another group of patients. The LRM included AA, DD, FDP, TT, APTT, and PT. The overall accuracy of the LRM was 80.9%, with sensitivity and specificity of 78.5% and 78.3%, respectively. The diagnostic threshold of the possibility of the LRM was 0.6492, with a sensitivity of 78.5% and a specificity of 78.3%. Subsequently, the LRM was validated with an overall accuracy of 83.6%. The LRM is a valuable model for prediction of thrombophilia in URPL patients. © 2017 International Federation of Gynecology and Obstetrics.

  2. Reproducing tailing in breakthrough curves: Are statistical models equally representative and predictive?

    Science.gov (United States)

    Pedretti, Daniele; Bianchi, Marco

    2018-03-01

    Breakthrough curves (BTCs) observed during tracer tests in highly heterogeneous aquifers display strong tailing. Power laws are popular models for both the empirical fitting of these curves, and the prediction of transport using upscaling models based on best-fitted estimated parameters (e.g. the power law slope or exponent). The predictive capacity of power law based upscaling models can be however questioned due to the difficulties to link model parameters with the aquifers' physical properties. This work analyzes two aspects that can limit the use of power laws as effective predictive tools: (a) the implication of statistical subsampling, which often renders power laws undistinguishable from other heavily tailed distributions, such as the logarithmic (LOG); (b) the difficulties to reconcile fitting parameters obtained from models with different formulations, such as the presence of a late-time cutoff in the power law model. Two rigorous and systematic stochastic analyses, one based on benchmark distributions and the other on BTCs obtained from transport simulations, are considered. It is found that a power law model without cutoff (PL) results in best-fitted exponents (αPL) falling in the range of typical experimental values reported in the literature (1.5 tailing becomes heavier. Strong fluctuations occur when the number of samples is limited, due to the effects of subsampling. On the other hand, when the power law model embeds a cutoff (PLCO), the best-fitted exponent (αCO) is insensitive to the degree of tailing and to the effects of subsampling and tends to a constant αCO ≈ 1. In the PLCO model, the cutoff rate (λ) is the parameter that fully reproduces the persistence of the tailing and is shown to be inversely correlated to the LOG scale parameter (i.e. with the skewness of the distribution). The theoretical results are consistent with the fitting analysis of a tracer test performed during the MADE-5 experiment. It is shown that a simple

  3. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    Science.gov (United States)

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  4. Predictive statistical modelling of cadmium content in durum wheat grain based on soil parameters.

    Science.gov (United States)

    Viala, Yoann; Laurette, Julien; Denaix, Laurence; Gourdain, Emmanuelle; Méléard, Benoit; Nguyen, Christophe; Schneider, André; Sappin-Didier, Valérie

    2017-09-01

    Regulatory limits on cadmium (Cd) content in food products are tending to become stricter, especially in cereals, which are a major contributor to dietary intake of Cd by humans. This is of particular importance for durum wheat, which accumulates more Cd than bread wheat. The contamination of durum wheat grain by Cd depends not only on the genotype but also to a large extent on soil Cd availability. Assessing the phytoavailability of Cd for durum wheat is thus crucial, and appropriate methods are required. For this purpose, we propose a statistical model to predict Cd accumulation in durum wheat grain based on soil geochemical properties related to Cd availability in French agricultural soils with low Cd contents and neutral to alkaline pH (soils commonly used to grow durum wheat). The best model is based on the concentration of total Cd in the soil solution, the pH of a soil CaCl 2 extract, the cation exchange capacity (CEC), and the content of manganese oxides (Tamm's extraction) in the soil. The model variables suggest a major influence of cadmium buffering power of the soil and of Cd speciation in solution. The model successfully explains 88% of Cd variability in grains with, generally, below 0.02 mg Cd kg -1 prediction error in wheat grain. Monte Carlo cross-validation indicated that model accuracy will suffice for the European Community project to reduce the regulatory limit from 0.2 to 0.15 mg Cd kg -1 grain, but not for the intermediate step at 0.175 mg Cd kg -1 . The model will help farmers assess the risk that the Cd content of their durum wheat grain will exceed regulatory limits, and help food safety authorities test different regulatory thresholds to find a trade-off between food safety and the negative impact a too strict regulation could have on farmers.

  5. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  6. Statistical prediction of AVB wear growth and initiation in model F steam generator tubes using Monte Carlo method

    International Nuclear Information System (INIS)

    Lee, Jae Bong; Park, Jae Hak; Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong

    2005-01-01

    The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained

  7. Statistical prediction of AVB wear growth and initiation in model F steam generator tubes using Monte Carlo method

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jae Bong; Park, Jae Hak [Chungbuk National Univ., Cheongju (Korea, Republic of); Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong [Korea Electtric Power Research Institute, Daejeon (Korea, Republic of)

    2005-07-01

    The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained.

  8. Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.

    Science.gov (United States)

    Yu, Ying; Wang, Yirui; Gao, Shangce; Tang, Zheng

    2017-01-01

    With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

  9. Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network

    Directory of Open Access Journals (Sweden)

    Ying Yu

    2017-01-01

    Full Text Available With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

  10. Predicting the distribution of four species of raptors (Aves: Accipitridae) in southern Spain: statistical models work better than existing maps

    OpenAIRE

    Bustamante, Javier; Seoane, Javier

    2004-01-01

    Aim To test the effectiveness of statistical models based on explanatory environmental variables vs. existing distribution information (maps and breeding atlas), for predicting the distribution of four species of raptors (family Accipitridae): common buzzard Buteo buteo (Linnaeus, 1758), short-toed eagle Circaetus gallicus (Gmelin, 1788), booted eagle Hieraaetus pennatus (Gmelin, 1788) and black kite Milvus migrans (Boddaert, 1783). Location Andalusia, southe...

  11. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    International Nuclear Information System (INIS)

    Grotch, S.L.

    1991-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections. 11 refs.; 10 figs

  12. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    International Nuclear Information System (INIS)

    Grotch, S.L.

    1990-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections

  13. Predicting the lung compliance of mechanically ventilated patients via statistical modeling

    International Nuclear Information System (INIS)

    Ganzert, Steven; Kramer, Stefan; Guttmann, Josef

    2012-01-01

    To avoid ventilator associated lung injury (VALI) during mechanical ventilation, the ventilator is adjusted with reference to the volume distensibility or ‘compliance’ of the lung. For lung-protective ventilation, the lung should be inflated at its maximum compliance, i.e. when during inspiration a maximal intrapulmonary volume change is achieved by a minimal change of pressure. To accomplish this, one of the main parameters is the adjusted positive end-expiratory pressure (PEEP). As changing the ventilator settings usually produces an effect on patient's lung mechanics with a considerable time delay, the prediction of the compliance change associated with a planned change of PEEP could assist the physician at the bedside. This study introduces a machine learning approach to predict the nonlinear lung compliance for the individual patient by Gaussian processes, a probabilistic modeling technique. Experiments are based on time series data obtained from patients suffering from acute respiratory distress syndrome (ARDS). With a high hit ratio of up to 93%, the learned models could predict whether an increase/decrease of PEEP would lead to an increase/decrease of the compliance. However, the prediction of the complete pressure–volume relation for an individual patient has to be improved. We conclude that the approach is well suitable for the given problem domain but that an individualized feature selection should be applied for a precise prediction of individual pressure–volume curves. (paper)

  14. A new statistical scission-point model fed with microscopic ingredients to predict fission fragments distributions

    International Nuclear Information System (INIS)

    Heinrich, S.

    2006-01-01

    Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)

  15. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    Science.gov (United States)

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. A statistical approach for predicting thermal diffusivity profiles in fusion plasmas as a transport model

    International Nuclear Information System (INIS)

    Yokoyama, Masayuki

    2014-01-01

    A statistical approach is proposed to predict thermal diffusivity profiles as a transport “model” in fusion plasmas. It can provide regression expressions for the ion and electron heat diffusivities (χ i and χ e ), separately, to construct their radial profiles. An approach that this letter is proposing outstrips the conventional scaling laws for the global confinement time (τ E ) since it also deals with profiles (temperature, density, heating depositions etc.). This approach has become possible with the analysis database accumulated by the extensive application of the integrated transport analysis suite to experiment data. In this letter, TASK3D-a analysis database for high-ion-temperature (high-T i ) plasmas in the LHD (Large Helical Device) is used as an example to describe an approach. (author)

  17. Drivers and seasonal predictability of extreme wind speeds in the ECMWF System 4 and a statistical model

    Science.gov (United States)

    Walz, M. A.; Donat, M.; Leckebusch, G. C.

    2017-12-01

    As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.

  18. Statistical surrogate models for prediction of high-consequence climate change.

    Energy Technology Data Exchange (ETDEWEB)

    Constantine, Paul; Field, Richard V., Jr.; Boslough, Mark Bruce Elrick

    2011-09-01

    In safety engineering, performance metrics are defined using probabilistic risk assessments focused on the low-probability, high-consequence tail of the distribution of possible events, as opposed to best estimates based on central tendencies. We frame the climate change problem and its associated risks in a similar manner. To properly explore the tails of the distribution requires extensive sampling, which is not possible with existing coupled atmospheric models due to the high computational cost of each simulation. We therefore propose the use of specialized statistical surrogate models (SSMs) for the purpose of exploring the probability law of various climate variables of interest. A SSM is different than a deterministic surrogate model in that it represents each climate variable of interest as a space/time random field. The SSM can be calibrated to available spatial and temporal data from existing climate databases, e.g., the Program for Climate Model Diagnosis and Intercomparison (PCMDI), or to a collection of outputs from a General Circulation Model (GCM), e.g., the Community Earth System Model (CESM) and its predecessors. Because of its reduced size and complexity, the realization of a large number of independent model outputs from a SSM becomes computationally straightforward, so that quantifying the risk associated with low-probability, high-consequence climate events becomes feasible. A Bayesian framework is developed to provide quantitative measures of confidence, via Bayesian credible intervals, in the use of the proposed approach to assess these risks.

  19. Overview of statistical methods, models and analysis for predicting equipment end of life

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2009-07-01

    Utility equipment can be operated and maintained for many years following installation. However, as the equipment ages, utility operators must decide whether to extend the service life or replace the equipment. Condition assessment modelling is used by many utilities to determine the condition of equipment and to prioritize the maintenance or repair. Several factors are weighted and combined in assessment modelling, which gives a single index number to rate the equipment. There is speculation that this index alone may not be adequate for a business case to rework or replace an asset because it only ranks an asset into a particular category. For that reason, a new methodology was developed to determine the economic end of life of an asset. This paper described the different statistical methods available and their use in determining the remaining service life of electrical equipment. A newly developed Excel-based demonstration computer tool is also an integral part of the deliverables of this project.

  20. Development and application of a statistical methodology to evaluate the predictive accuracy of building energy baseline models

    Energy Technology Data Exchange (ETDEWEB)

    Granderson, Jessica [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.; Price, Phillip N. [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.

    2014-03-01

    This paper documents the development and application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-­building energy savings. The methodology complements the principles addressed in resources such as ASHRAE Guideline 14 and the International Performance Measurement and Verification Protocol. It requires fitting a baseline model to data from a ``training period’’ and using the model to predict total electricity consumption during a subsequent ``prediction period.’’ We illustrate the methodology by evaluating five baseline models using data from 29 buildings. The training period and prediction period were varied, and model predictions of daily, weekly, and monthly energy consumption were compared to meter data to determine model accuracy. Several metrics were used to characterize the accuracy of the predictions, and in some cases the best-­performing model as judged by one metric was not the best performer when judged by another metric.

  1. Prediction of hydrate formation temperature by both statistical models and artificial neural network approaches

    International Nuclear Information System (INIS)

    Zahedi, Gholamreza; Karami, Zohre; Yaghoobi, Hamed

    2009-01-01

    In this study, various estimation methods have been reviewed for hydrate formation temperature (HFT) and two procedures have been presented. In the first method, two general correlations have been proposed for HFT. One of the correlations has 11 parameters, and the second one has 18 parameters. In order to obtain constants in proposed equations, 203 experimental data points have been collected from literatures. The Engineering Equation Solver (EES) and Statistical Package for the Social Sciences (SPSS) soft wares have been employed for statistical analysis of the data. Accuracy of the obtained correlations also has been declared by comparison with experimental data and some recent common used correlations. In the second method, HFT is estimated by artificial neural network (ANN) approach. In this case, various architectures have been checked using 70% of experimental data for training of ANN. Among the various architectures multi layer perceptron (MLP) network with trainlm training algorithm was found as the best architecture. Comparing the obtained ANN model results with 30% of unseen data confirms ANN excellent estimation performance. It was found that ANN is more accurate than traditional methods and even our two proposed correlations for HFT estimation.

  2. A statistical model to predict total column ozone in Peninsular Malaysia

    Science.gov (United States)

    Tan, K. C.; Lim, H. S.; Mat Jafri, M. Z.

    2016-03-01

    This study aims to predict monthly columnar ozone in Peninsular Malaysia based on concentrations of several atmospheric gases. Data pertaining to five atmospheric gases (CO2, O3, CH4, NO2, and H2O vapor) were retrieved by satellite scanning imaging absorption spectrometry for atmospheric chartography from 2003 to 2008 and used to develop a model to predict columnar ozone in Peninsular Malaysia. Analyses of the northeast monsoon (NEM) and the southwest monsoon (SWM) seasons were conducted separately. Based on the Pearson correlation matrices, columnar ozone was negatively correlated with H2O vapor but positively correlated with CO2 and NO2 during both the NEM and SWM seasons from 2003 to 2008. This result was expected because NO2 is a precursor of ozone. Therefore, an increase in columnar ozone concentration is associated with an increase in NO2 but a decrease in H2O vapor. In the NEM season, columnar ozone was negatively correlated with H2O (-0.847), NO2 (0.754), and CO2 (0.477); columnar ozone was also negatively but weakly correlated with CH4 (-0.035). In the SWM season, columnar ozone was highly positively correlated with NO2 (0.855), CO2 (0.572), and CH4 (0.321) and also highly negatively correlated with H2O (-0.832). Both multiple regression and principal component analyses were used to predict the columnar ozone value in Peninsular Malaysia. We obtained the best-fitting regression equations for the columnar ozone data using four independent variables. Our results show approximately the same R value (≈ 0.83) for both the NEM and SWM seasons.

  3. Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in V2

    Directory of Open Access Journals (Sweden)

    Gutmann Michael

    2005-02-01

    Full Text Available Abstract Background It has been shown that the classical receptive fields of simple and complex cells in the primary visual cortex emerge from the statistical properties of natural images by forcing the cell responses to be maximally sparse or independent. We investigate how to learn features beyond the primary visual cortex from the statistical properties of modelled complex-cell outputs. In previous work, we showed that a new model, non-negative sparse coding, led to the emergence of features which code for contours of a given spatial frequency band. Results We applied ordinary independent component analysis to modelled outputs of complex cells that span different frequency bands. The analysis led to the emergence of features which pool spatially coherent across-frequency activity in the modelled primary visual cortex. Thus, the statistically optimal way of processing complex-cell outputs abandons separate frequency channels, while preserving and even enhancing orientation tuning and spatial localization. As a technical aside, we found that the non-negativity constraint is not necessary: ordinary independent component analysis produces essentially the same results as our previous work. Conclusion We propose that the pooling that emerges allows the features to code for realistic low-level image features related to step edges. Further, the results prove the viability of statistical modelling of natural images as a framework that produces quantitative predictions of visual processing.

  4. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    Science.gov (United States)

    2017-09-01

    application of statistical inference. Even when human forecasters leverage their professional experience, which is often gained through long periods of... application throughout statistics and Bayesian data analysis. The multivariate form of 2( , )  (e.g., Figure 12) is similarly analytically...data (i.e., no systematic manipulations with analytical functions), it is common in the statistical literature to apply mathematical transformations

  5. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    Science.gov (United States)

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  6. Vitamin D and ferritin correlation with chronic neck pain using standard statistics and a novel artificial neural network prediction model.

    Science.gov (United States)

    Eloqayli, Haytham; Al-Yousef, Ali; Jaradat, Raid

    2018-02-15

    Despite the high prevalence of chronic neck pain, there is limited consensus about the primary etiology, risk factors, diagnostic criteria and therapeutic outcome. Here, we aimed to determine if Ferritin and Vitamin D are modifiable risk factors with chronic neck pain using slandered statistics and artificial intelligence neural network (ANN). Fifty-four patients with chronic neck pain treated between February 2016 and August 2016 in King Abdullah University Hospital and 54 patients age matched controls undergoing outpatient or minor procedures were enrolled. Patients and control demographic parameters, height, weight and single measurement of serum vitamin D, Vitamin B12, ferritin, calcium, phosphorus, zinc were obtained. An ANN prediction model was developed. The statistical analysis reveals that patients with chronic neck pain have significantly lower serum Vitamin D and Ferritin (p-value artificial neural network can be of future benefit in classification and prediction models for chronic neck pain. We hope this initial work will encourage a future larger cohort study addressing vitamin D and iron correction as modifiable factors and the application of artificial intelligence models in clinical practice.

  7. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Larsen, Gunner Chr.; Hansen, Kurt Schaldemose

    2004-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continously increase the knowledge on wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describe the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of high-sampled full-scale time series measurements...... are consistent, given the inevitabel uncertainties associated with model as well as with the extreme value data analysis. Keywords: Statistical model, extreme wind conditions, statistical analysis, turbulence, wind loading, statistical analysis, turbulence, wind loading, wind shear, wind turbines....

  8. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2004-01-01

    Statistical process control (SPC) is used to decide when to stop a process as confidence in the quality of the next item(s) is low. Information to specify a parametric model is not always available, and as SPC is of a predictive nature, we present a control chart developed using nonparametric

  9. Statistical prediction of Late Miocene climate

    Digital Repository Service at National Institute of Oceanography (India)

    Fernandes, A.A; Gupta, S.M.

    by making certain simplifying assumptions; for example in modelling ocean 4 currents, the geostrophic approximation is made. In case of statistical prediction no such a priori assumption need be made. statistical prediction comprises of using observed data... the number of equations. In this case the equations are overdetermined, and therefore one has to look for a solution that best fits the sample data in a least squares sense. To this end we express the sample .... (2.1)+ ry = y + data as follows: n L c. (x...

  10. Statistical modelling coupled with LC-MS analysis to predict human upper intestinal absorption of phytochemical mixtures.

    Science.gov (United States)

    Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise

    2018-04-15

    A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Parameter estimations in predictive microbiology: Statistically sound modelling of the microbial growth rate.

    Science.gov (United States)

    Akkermans, Simen; Logist, Filip; Van Impe, Jan F

    2018-04-01

    When building models to describe the effect of environmental conditions on the microbial growth rate, parameter estimations can be performed either with a one-step method, i.e., directly on the cell density measurements, or in a two-step method, i.e., via the estimated growth rates. The two-step method is often preferred due to its simplicity. The current research demonstrates that the two-step method is, however, only valid if the correct data transformation is applied and a strict experimental protocol is followed for all experiments. Based on a simulation study and a mathematical derivation, it was demonstrated that the logarithm of the growth rate should be used as a variance stabilizing transformation. Moreover, the one-step method leads to a more accurate estimation of the model parameters and a better approximation of the confidence intervals on the estimated parameters. Therefore, the one-step method is preferred and the two-step method should be avoided. Copyright © 2017. Published by Elsevier Ltd.

  12. Statistical Model of Extreme Shear

    DEFF Research Database (Denmark)

    Hansen, Kurt Schaldemose; Larsen, Gunner Chr.

    2005-01-01

    In order to continue cost-optimisation of modern large wind turbines, it is important to continuously increase the knowledge of wind field parameters relevant to design loads. This paper presents a general statistical model that offers site-specific prediction of the probability density function...... by a model that, on a statistically consistent basis, describes the most likely spatial shape of an extreme wind shear event. Predictions from the model have been compared with results from an extreme value data analysis, based on a large number of full-scale measurements recorded with a high sampling rate...

  13. Improving thermal model prediction through statistical analysis of irradiation and post-irradiation data from AGR experiments

    International Nuclear Information System (INIS)

    Pham, Binh T.; Hawkes, Grant L.; Einerson, Jeffrey J.

    2014-01-01

    As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test

  14. Improving thermal model prediction through statistical analysis of irradiation and post-irradiation data from AGR experiments

    Energy Technology Data Exchange (ETDEWEB)

    Pham, Binh T., E-mail: Binh.Pham@inl.gov [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Hawkes, Grant L. [Thermal Science and Safety Analysis Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Einerson, Jeffrey J. [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States)

    2014-05-01

    As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test.

  15. Sampling, Probability Models and Statistical Reasoning Statistical

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 5. Sampling, Probability Models and Statistical Reasoning Statistical Inference. Mohan Delampady V R Padmawar. General Article Volume 1 Issue 5 May 1996 pp 49-58 ...

  16. SIMPLIFIED PREDICTIVE MODELS FOR CO₂ SEQUESTRATION PERFORMANCE ASSESSMENT RESEARCH TOPICAL REPORT ON TASK #3 STATISTICAL LEARNING BASED MODELS

    Energy Technology Data Exchange (ETDEWEB)

    Mishra, Srikanta; Schuetter, Jared

    2014-11-01

    We compare two approaches for building a statistical proxy model (metamodel) for CO₂ geologic sequestration from the results of full-physics compositional simulations. The first approach involves a classical Box-Behnken or Augmented Pairs experimental design with a quadratic polynomial response surface. The second approach used a space-filling maxmin Latin Hypercube sampling or maximum entropy design with the choice of five different meta-modeling techniques: quadratic polynomial, kriging with constant and quadratic trend terms, multivariate adaptive regression spline (MARS) and additivity and variance stabilization (AVAS). Simulations results for CO₂ injection into a reservoir-caprock system with 9 design variables (and 97 samples) were used to generate the data for developing the proxy models. The fitted models were validated with using an independent data set and a cross-validation approach for three different performance metrics: total storage efficiency, CO₂ plume radius and average reservoir pressure. The Box-Behnken–quadratic polynomial metamodel performed the best, followed closely by the maximin LHS–kriging metamodel.

  17. Diffeomorphic Statistical Deformation Models

    DEFF Research Database (Denmark)

    Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus

    2007-01-01

    In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al....... The modifications ensure that no boundary restriction has to be enforced on the parameter space to prevent folds or tears in the deformation field. For straightforward statistical analysis, principal component analysis and sparse methods, we assume that the parameters for a class of deformations lie on a linear...... with ground truth in form of manual expert annotations, and compared to Cootes's model. We anticipate applications in unconstrained diffeomorphic synthesis of images, e.g. for tracking, segmentation, registration or classification purposes....

  18. Wind power prediction models

    Science.gov (United States)

    Levy, R.; Mcginness, H.

    1976-01-01

    Investigations were performed to predict the power available from the wind at the Goldstone, California, antenna site complex. The background for power prediction was derived from a statistical evaluation of available wind speed data records at this location and at nearby locations similarly situated within the Mojave desert. In addition to a model for power prediction over relatively long periods of time, an interim simulation model that produces sample wind speeds is described. The interim model furnishes uncorrelated sample speeds at hourly intervals that reproduce the statistical wind distribution at Goldstone. A stochastic simulation model to provide speed samples representative of both the statistical speed distributions and correlations is also discussed.

  19. Statistical short-term earthquake prediction.

    Science.gov (United States)

    Kagan, Y Y; Knopoff, L

    1987-06-19

    A statistical procedure, derived from a theoretical model of fracture growth, is used to identify a foreshock sequence while it is in progress. As a predictor, the procedure reduces the average uncertainty in the rate of occurrence for a future strong earthquake by a factor of more than 1000 when compared with the Poisson rate of occurrence. About one-third of all main shocks with local magnitude greater than or equal to 4.0 in central California can be predicted in this way, starting from a 7-year database that has a lower magnitude cut off of 1.5. The time scale of such predictions is of the order of a few hours to a few days for foreshocks in the magnitude range from 2.0 to 5.0.

  20. A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

    Science.gov (United States)

    Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

    2018-04-01

    In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.

  1. Predicting radiotherapy outcomes using statistical learning techniques

    International Nuclear Information System (INIS)

    El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O; Lindsay, Patricia E; Hope, Andrew J

    2009-01-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model

  2. Crop Yield Predictions - High Resolution Statistical Model for Intra-season Forecasts Applied to Corn in the US

    Science.gov (United States)

    Cai, Y.

    2017-12-01

    Accurately forecasting crop yields has broad implications for economic trading, food production monitoring, and global food security. However, the variation of environmental variables presents challenges to model yields accurately, especially when the lack of highly accurate measurements creates difficulties in creating models that can succeed across space and time. In 2016, we developed a sequence of machine-learning based models forecasting end-of-season corn yields for the US at both the county and national levels. We combined machine learning algorithms in a hierarchical way, and used an understanding of physiological processes in temporal feature selection, to achieve high precision in our intra-season forecasts, including in very anomalous seasons. During the live run, we predicted the national corn yield within 1.40% of the final USDA number as early as August. In the backtesting of the 2000-2015 period, our model predicts national yield within 2.69% of the actual yield on average already by mid-August. At the county level, our model predicts 77% of the variation in final yield using data through the beginning of August and improves to 80% by the beginning of October, with the percentage of counties predicted within 10% of the average yield increasing from 68% to 73%. Further, the lowest errors are in the most significant producing regions, resulting in very high precision national-level forecasts. In addition, we identify the changes of important variables throughout the season, specifically early-season land surface temperature, and mid-season land surface temperature and vegetation index. For the 2017 season, we feed 2016 data to the training set, together with additional geospatial data sources, aiming to make the current model even more precise. We will show how our 2017 US corn yield forecasts converges in time, which factors affect the yield the most, as well as present our plans for 2018 model adjustments.

  3. Use of statistical models based on radiographic measurements to predict oviposition date and clutch size in rock iguanas (Cyclura nubila)

    International Nuclear Information System (INIS)

    Alberts, A.C.

    1995-01-01

    The ability to noninvasively estimate clutch size and predict oviposition date in reptiles can be useful not only to veterinary clinicians but also to managers of captive collections and field researchers. Measurements of egg size and shape, as well as position of the clutch within the coelomic cavity, were taken from diagnostic radiographs of 20 female Cuban rock iguanas, Cyclura nubila, 81 to 18 days prior to laying. Combined with data on maternal body size, these variables were entered into multiple regression models to predict clutch size and timing of egg laying. The model for clutch size was accurate to 0.53 ± 0.08 eggs, while the model for oviposition date was accurate to 6.22 ± 0.81 days. Equations were generated that should be applicable to this and other large Cyclura species. © 1995 Wiley-Liss, Inc

  4. Predictive factors of early moderate/severe ovarian hyperstimulation syndrome in non-polycystic ovarian syndrome patients: a statistical model.

    Science.gov (United States)

    Ashrafi, Mahnaz; Bahmanabadi, Akram; Akhond, Mohammad Reza; Arabipoor, Arezoo

    2015-11-01

    To evaluate demographic, medical history and clinical cycle characteristics of infertile non-polycystic ovary syndrome (NPCOS) women with the purpose of investigating their associations with the prevalence of moderate-to-severe OHSS. In this retrospective study, among 7073 in vitro fertilization and/or intracytoplasmic sperm injection (IVF/ICSI) cycles, 86 cases of NPCO patients who developed moderate-to-severe OHSS while being treated with IVF/ICSI cycles were analyzed during the period of January 2008 to December 2010 at Royan Institute. To review the OHSS risk factors, 172 NPCOS patients without developing OHSS, treated at the same period of time, were selected randomly by computer as control group. We used multiple logistic regression in a backward manner to build a prediction model. The regression analysis revealed that the variables, including age [odds ratio (OR) 0.9, confidence interval (CI) 0.81-0.99], antral follicles count (OR 4.3, CI 2.7-6.9), infertility cause (tubal factor, OR 11.5, CI 1.1-51.3), hypothyroidism (OR 3.8, CI 1.5-9.4) and positive history of ovarian surgery (OR 0.2, CI 0.05-0.9) were the most important predictors of OHSS. The regression model had an area under curve of 0.94, presenting an allowable discriminative performance that was equal with two strong predictive variables, including the number of follicles and serum estradiol level on human chorionic gonadotropin day. The predictive regression model based on primary characteristics of NPCOS patients had equal specificity in comparison with two mentioned strong predictive variables. Therefore, it may be beneficial to apply this model before the beginning of ovarian stimulation protocol.

  5. Falling in the elderly: Do statistical models matter for performance criteria of fall prediction? Results from two large population-based studies.

    Science.gov (United States)

    Kabeshova, Anastasiia; Launay, Cyrille P; Gromov, Vasilii A; Fantino, Bruno; Levinoff, Elise J; Allali, Gilles; Beauchet, Olivier

    2016-01-01

    To compare performance criteria (i.e., sensitivity, specificity, positive predictive value, negative predictive value, area under receiver operating characteristic curve and accuracy) of linear and non-linear statistical models for fall risk in older community-dwellers. Participants were recruited in two large population-based studies, "Prévention des Chutes, Réseau 4" (PCR4, n=1760, cross-sectional design, retrospective collection of falls) and "Prévention des Chutes Personnes Agées" (PCPA, n=1765, cohort design, prospective collection of falls). Six linear statistical models (i.e., logistic regression, discriminant analysis, Bayes network algorithm, decision tree, random forest, boosted trees), three non-linear statistical models corresponding to artificial neural networks (multilayer perceptron, genetic algorithm and neuroevolution of augmenting topologies [NEAT]) and the adaptive neuro fuzzy interference system (ANFIS) were used. Falls ≥1 characterizing fallers and falls ≥2 characterizing recurrent fallers were used as outcomes. Data of studies were analyzed separately and together. NEAT and ANFIS had better performance criteria compared to other models. The highest performance criteria were reported with NEAT when using PCR4 database and falls ≥1, and with both NEAT and ANFIS when pooling data together and using falls ≥2. However, sensitivity and specificity were unbalanced. Sensitivity was higher than specificity when identifying fallers, whereas the converse was found when predicting recurrent fallers. Our results showed that NEAT and ANFIS were non-linear statistical models with the best performance criteria for the prediction of falls but their sensitivity and specificity were unbalanced, underscoring that models should be used respectively for the screening of fallers and the diagnosis of recurrent fallers. Copyright © 2015 European Federation of Internal Medicine. Published by Elsevier B.V. All rights reserved.

  6. Sugar and acid content of Citrus prediction modeling using FT-IR fingerprinting in combination with multivariate statistical analysis.

    Science.gov (United States)

    Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung

    2016-01-01

    A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features

    Energy Technology Data Exchange (ETDEWEB)

    Grimm, Lars J., E-mail: Lars.grimm@duke.edu; Ghate, Sujata V.; Yoon, Sora C.; Kim, Connie [Department of Radiology, Duke University Medical Center, Box 3808, Durham, North Carolina 27710 (United States); Kuzmiak, Cherie M. [Department of Radiology, University of North Carolina School of Medicine, 2006 Old Clinic, CB No. 7510, Chapel Hill, North Carolina 27599 (United States); Mazurowski, Maciej A. [Duke University Medical Center, Box 2731 Medical Center, Durham, North Carolina 27710 (United States)

    2014-03-15

    Purpose: The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Methods: Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Results: Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502–0.739, 95% Confidence Interval: 0.543–0.680,p < 0.002). Conclusions: Patterns in detection errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees.

  8. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features.

    Science.gov (United States)

    Grimm, Lars J; Ghate, Sujata V; Yoon, Sora C; Kuzmiak, Cherie M; Kim, Connie; Mazurowski, Maciej A

    2014-03-01

    The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502-0.739, 95% Confidence Interval: 0.543-0.680,p errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees.

  9. Predicting error in detecting mammographic masses among radiology trainees using statistical models based on BI-RADS features

    International Nuclear Information System (INIS)

    Grimm, Lars J.; Ghate, Sujata V.; Yoon, Sora C.; Kim, Connie; Kuzmiak, Cherie M.; Mazurowski, Maciej A.

    2014-01-01

    Purpose: The purpose of this study is to explore Breast Imaging-Reporting and Data System (BI-RADS) features as predictors of individual errors made by trainees when detecting masses in mammograms. Methods: Ten radiology trainees and three expert breast imagers reviewed 100 mammograms comprised of bilateral medial lateral oblique and craniocaudal views on a research workstation. The cases consisted of normal and biopsy proven benign and malignant masses. For cases with actionable abnormalities, the experts recorded breast (density and axillary lymph nodes) and mass (shape, margin, and density) features according to the BI-RADS lexicon, as well as the abnormality location (depth and clock face). For each trainee, a user-specific multivariate model was constructed to predict the trainee's likelihood of error based on BI-RADS features. The performance of the models was assessed using area under the receive operating characteristic curves (AUC). Results: Despite the variability in errors between different trainees, the individual models were able to predict the likelihood of error for the trainees with a mean AUC of 0.611 (range: 0.502–0.739, 95% Confidence Interval: 0.543–0.680,p < 0.002). Conclusions: Patterns in detection errors for mammographic masses made by radiology trainees can be modeled using BI-RADS features. These findings may have potential implications for the development of future educational materials that are personalized to individual trainees

  10. QSPR Models for Predicting Log Pliver Values for Volatile Organic Compounds Combining Statistical Methods and Domain Knowledge

    Directory of Open Access Journals (Sweden)

    Mónica F. Díaz

    2012-12-01

    Full Text Available Volatile organic compounds (VOCs are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log Pliver for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR models for the prediction of log Pliver, where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log Pliver models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.

  11. A Robust Statistical Model to Predict the Future Value of the Milk Production of Dairy Cows Using Herd Recording Data

    DEFF Research Database (Denmark)

    Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose

    2017-01-01

    of the future value of a dairy cow requires further detailed knowledge of the costs associated with feed, management practices, production systems, and disease. Here, we present a method to predict the future value of the milk production of a dairy cow based on herd recording data only. The method consists......The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction...... of somatic cell count. We conclude that estimates of future average production can be used on a day-to-day basis to rank cows for culling, or can be implemented in simulation models of within-herd disease spread to make operational decisions, such as culling versus treatment. An advantage of the approach...

  12. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    Science.gov (United States)

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  13. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    Science.gov (United States)

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to

  14. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

    Science.gov (United States)

    Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

    2017-01-01

    If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

  15. Statistical Basis for Predicting Technological Progress

    Science.gov (United States)

    Nagy, Béla; Farmer, J. Doyne; Bui, Quan M.; Trancik, Jessika E.

    2013-01-01

    Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation. PMID:23468837

  16. Statistical basis for predicting technological progress.

    Directory of Open Access Journals (Sweden)

    Béla Nagy

    Full Text Available Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.

  17. Prediction of velocity distributions in rod bundle axial flow, with a statistical model (K-epsilon) of turbulence

    International Nuclear Information System (INIS)

    Silva Junior, H.C. da.

    1978-12-01

    Reactor fuel elements generally consist of rod bundles with the coolant flowing axially through the region between the rods. The confiability of the thermohydraulic design of such elements is related to a detailed description of the velocity field. A two-equation statistical model (K-epsilon) of turbulence is applied to compute main and secondary flow fields, wall shear stress distributions and friction factors of steady, fully developed turbulent flows, with incompressible, temperature independent fluid flowing axially through triangular or square arrays of rod bundles. The numerical procedure uses the vorticity and the stream function to describe the velocity field. Comparison with experimental and analytical data of several investigators is presented. Results are in good agreement. (Author) [pt

  18. The statistical evaluation and comparison of ADMS-Urban model for the prediction of nitrogen dioxide with air quality monitoring network.

    Science.gov (United States)

    Dėdelė, Audrius; Miškinytė, Auksė

    2015-09-01

    In many countries, road traffic is one of the main sources of air pollution associated with adverse effects on human health and environment. Nitrogen dioxide (NO2) is considered to be a measure of traffic-related air pollution, with concentrations tending to be higher near highways, along busy roads, and in the city centers, and the exceedances are mainly observed at measurement stations located close to traffic. In order to assess the air quality in the city and the air pollution impact on public health, air quality models are used. However, firstly, before the model can be used for these purposes, it is important to evaluate the accuracy of the dispersion modelling as one of the most widely used method. The monitoring and dispersion modelling are two components of air quality monitoring system (AQMS), in which statistical comparison was made in this research. The evaluation of the Atmospheric Dispersion Modelling System (ADMS-Urban) was made by comparing monthly modelled NO2 concentrations with the data of continuous air quality monitoring stations in Kaunas city. The statistical measures of model performance were calculated for annual and monthly concentrations of NO2 for each monitoring station site. The spatial analysis was made using geographic information systems (GIS). The calculation of statistical parameters indicated a good ADMS-Urban model performance for the prediction of NO2. The results of this study showed that the agreement of modelled values and observations was better for traffic monitoring stations compared to the background and residential stations.

  19. A statistical model for prediction of fuel element failure using the Markov process and entropy minimax principles

    International Nuclear Information System (INIS)

    Choi, K.Y.; Yoon, Y.K.; Chang, S.H.

    1991-01-01

    This paper reports on a new statistical fuel failure model developed to take into account the effects of damaging environmental conditions and the overall operating history of the fuel elements. The degradation of material properties and damage resistance of the fuel cladding is mainly caused by the combined effects of accumulated dynamic stresses, neutron irradiation, and chemical and stress corrosion at operating temperature. Since the degradation of material properties due to these effects can be considered as a stochastic process, a dynamic reliability function is derived based on the Markov process. Four damage parameters, namely, dynamic stresses, magnitude of power increase from the preceding power level and with ramp rate, and fatigue cycles, are used to build this model. The dynamic reliability function and damage parameters are used to obtain effective damage parameters. The entropy maximization principle is used to generate a probability density function of the effective damage parameters. The entropy minimization principle is applied to determine weighting factors for amalgamation of the failure probabilities due to the respective failure modes. In this way, the effects of operating history, damaging environmental conditions, and damage sequence are taken into account

  20. Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

    Energy Technology Data Exchange (ETDEWEB)

    Marzouk, Youssef [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)

    2016-08-31

    Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesian inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.

  1. Nonparametric predictive inference in statistical process control

    NARCIS (Netherlands)

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2000-01-01

    New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on

  2. Predicting Statistical Distributions of Footbridge Vibrations

    DEFF Research Database (Denmark)

    Pedersen, Lars; Frier, Christian

    2009-01-01

    The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...

  3. Exclusion statistics and integrable models

    International Nuclear Information System (INIS)

    Mashkevich, S.

    1998-01-01

    The definition of exclusion statistics, as given by Haldane, allows for a statistical interaction between distinguishable particles (multi-species statistics). The thermodynamic quantities for such statistics ca be evaluated exactly. The explicit expressions for the cluster coefficients are presented. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models. The interesting questions of generalizing this correspondence onto the higher-dimensional and the multi-species cases remain essentially open

  4. A statistical model of the international spread of wild poliovirus in Africa used to predict and prevent outbreaks.

    Directory of Open Access Journals (Sweden)

    Kathleen M O'Reilly

    2011-10-01

    Full Text Available Outbreaks of poliomyelitis in African countries that were previously free of wild-type poliovirus cost the Global Polio Eradication Initiative US$850 million during 2003-2009, and have limited the ability of the program to focus on endemic countries. A quantitative understanding of the factors that predict the distribution and timing of outbreaks will enable their prevention and facilitate the completion of global eradication.Children with poliomyelitis in Africa from 1 January 2003 to 31 December 2010 were identified through routine surveillance of cases of acute flaccid paralysis, and separate outbreaks associated with importation of wild-type poliovirus were defined using the genetic relatedness of these viruses in the VP1/2A region. Potential explanatory variables were examined for their association with the number, size, and duration of poliomyelitis outbreaks in 6-mo periods using multivariable regression analysis. The predictive ability of 6-mo-ahead forecasts of poliomyelitis outbreaks in each country based on the regression model was assessed. A total of 142 genetically distinct outbreaks of poliomyelitis were recorded in 25 African countries, resulting in 1-228 cases (median of two cases. The estimated number of people arriving from infected countries and <5-y childhood mortality were independently associated with the number of outbreaks. Immunisation coverage based on the reported vaccination history of children with non-polio acute flaccid paralysis was associated with the duration and size of each outbreak, as well as the number of outbreaks. Six-month-ahead forecasts of the number of outbreaks in a country or region changed over time and had a predictive ability of 82%.Outbreaks of poliomyelitis resulted primarily from continued transmission in Nigeria and the poor immunisation status of populations in neighbouring countries. From 1 January 2010 to 30 June 2011, reduced transmission in Nigeria and increased incidence in reinfected

  5. Statistical Modelling of Wind Proles - Data Analysis and Modelling

    DEFF Research Database (Denmark)

    Jónsson, Tryggvi; Pinson, Pierre

    The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles.......The aim of the analysis presented in this document is to investigate whether statistical models can be used to make very short-term predictions of wind profiles....

  6. Modelling short- and long-term statistical learning of music as a process of predictive entropy reduction

    DEFF Research Database (Denmark)

    Hansen, Niels Christian; Loui, Psyche; Vuust, Peter

    Statistical learning underlies the generation of expectations with different degrees of uncertainty. In music, uncertainty applies to expectations for pitches in a melody. This uncertainty can be quantified by Shannon entropy from distributions of expectedness ratings for multiple continuations o...

  7. Comparative study on the predictability of statistical models (RSM and ANN) on the behavior of optimized buccoadhesive wafers containing Loratadine and their in vivo assessment.

    Science.gov (United States)

    Chakraborty, Prithviraj; Parcha, Versha; Chakraborty, Debarupa D; Ghosh, Amitava

    2016-01-01

    Buccoadhesive wafer dosage form containing Loratadine is formulated utilizing Formulation by Design (FbD) approach incorporating sodium alginate and lactose monohydrate as independent variable employing solvent casting method. The wafers were statistically optimized using Response Surface Methodology (RSM) and Artificial Neural Network algorithm (ANN) for predicting physicochemical and physico-mechanical properties of the wafers as responses. Morphologically wafers were tested using SEM. Quick disintegration of the samples was examined employing Optical Contact Angle (OCA). The comparison of the predictability of RSM and ANN showed a high prognostic capacity of RSM model over ANN model in forecasting mechanical and physicochemical properties of the wafers. The in vivo assessment of the optimized buccoadhesive wafer exhibits marked increase in bioavailability justifying the administration of Loratadine through buccal route, bypassing hepatic first pass metabolism.

  8. Exclusion statistics and integrable models

    International Nuclear Information System (INIS)

    Mashkevich, S.

    1998-01-01

    The definition of exclusion statistics that was given by Haldane admits a 'statistical interaction' between distinguishable particles (multispecies statistics). For such statistics, thermodynamic quantities can be evaluated exactly; explicit expressions are presented here for cluster coefficients. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models of the Calogero-Sutherland type. The interesting questions of generalizing this correspondence to the higher-dimensional and the multispecies cases remain essentially open; however, our results provide some hints as to searches for the models in question

  9. Statistical model to predict dry sliding wear behaviour of Aluminium-Jute bast ash particulate composite produced by stir-casting

    Directory of Open Access Journals (Sweden)

    Gambo Anthony VICTOR

    2017-06-01

    Full Text Available A model to predict the dry sliding wear behaviour of Aluminium-Jute bast ash particulate composites produced by double stir-casting method was developed in terms of weight fraction of jute bast ash (JBA. Experiments were designed on the basis of the Design of Experiments (DOE technique. A 2k factorial, where k is the number of variables, with central composite second-order rotatable design was used to improve the reliability of results and to reduce the size of experimentation without loss of accuracy. The factors considered in this study were sliding velocity, sliding distance, normal load and mass fraction of JBA reinforcement in the matrix. The developed regression model was validated by statistical software MINITAB-R14 and statistical tool such as analysis of variance (ANOVA. It was found that the developed regression model could be effectively used to predict the wear rate at 95% confidence level. The wear rate of cast Al-JBAp composite decreased with an increase in the mass fraction of JBA and increased with an increase of the sliding velocity, sliding distance and normal load acting on the composite specimen.

  10. Evaluation of statistical and rainfall-runoff models for predicting historical daily streamflow time series in the Des Moines and Iowa River watersheds

    Science.gov (United States)

    Farmer, William H.; Knight, Rodney R.; Eash, David A.; Kasey J. Hutchinson,; Linhart, S. Mike; Christiansen, Daniel E.; Archfield, Stacey A.; Over, Thomas M.; Kiang, Julie E.

    2015-08-24

    Daily records of streamflow are essential to understanding hydrologic systems and managing the interactions between human and natural systems. Many watersheds and locations lack streamgages to provide accurate and reliable records of daily streamflow. In such ungaged watersheds, statistical tools and rainfall-runoff models are used to estimate daily streamflow. Previous work compared 19 different techniques for predicting daily streamflow records in the southeastern United States. Here, five of the better-performing methods are compared in a different hydroclimatic region of the United States, in Iowa. The methods fall into three classes: (1) drainage-area ratio methods, (2) nonlinear spatial interpolations using flow duration curves, and (3) mechanistic rainfall-runoff models. The first two classes are each applied with nearest-neighbor and map-correlated index streamgages. Using a threefold validation and robust rank-based evaluation, the methods are assessed for overall goodness of fit of the hydrograph of daily streamflow, the ability to reproduce a daily, no-fail storage-yield curve, and the ability to reproduce key streamflow statistics. As in the Southeast study, a nonlinear spatial interpolation of daily streamflow using flow duration curves is found to be a method with the best predictive accuracy. Comparisons with previous work in Iowa show that the accuracy of mechanistic models with at-site calibration is substantially degraded in the ungaged framework.

  11. A Robust Statistical Model to Predict the Future Value of the Milk Production of Dairy Cows Using Herd Recording Data

    DEFF Research Database (Denmark)

    Græsbøll, Kaare; Kirkeby, Carsten Thure; Nielsen, Søren Saxmose

    2017-01-01

    The future value of an individual dairy cow depends greatly on its projected milk yield. In developed countries with developed dairy industry infrastructures, facilities exist to record individual cow production and reproduction outcomes consistently and accurately. Accurate prediction of the fut...

  12. SU-F-BRB-10: A Statistical Voxel Based Normal Organ Dose Prediction Model for Coplanar and Non-Coplanar Prostate Radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Tran, A; Yu, V; Nguyen, D; Woods, K; Low, D; Sheng, K [UCLA, Los Angeles, CA (United States)

    2015-06-15

    Purpose: Knowledge learned from previous plans can be used to guide future treatment planning. Existing knowledge-based treatment planning methods study the correlation between organ geometry and dose volume histogram (DVH), which is a lossy representation of the complete dose distribution. A statistical voxel dose learning (SVDL) model was developed that includes the complete dose volume information. Its accuracy of predicting volumetric-modulated arc therapy (VMAT) and non-coplanar 4π radiotherapy was quantified. SVDL provided more isotropic dose gradients and may improve knowledge-based planning. Methods: 12 prostate SBRT patients originally treated using two full-arc VMAT techniques were re-planned with 4π using 20 intensity-modulated non-coplanar fields to a prescription dose of 40 Gy. The bladder and rectum voxels were binned based on their distances to the PTV. The dose distribution in each bin was resampled by convolving to a Gaussian kernel, resulting in 1000 data points in each bin that predicted the statistical dose information of a voxel with unknown dose in a new patient without triaging information that may be collectively important to a particular patient. We used this method to predict the DVHs, mean and max doses in a leave-one-out cross validation (LOOCV) test and compared its performance against lossy estimators including mean, median, mode, Poisson and Rayleigh of the voxelized dose distributions. Results: SVDL predicted the bladder and rectum doses more accurately than other estimators, giving mean percentile errors ranging from 13.35–19.46%, 4.81–19.47%, 22.49–28.69%, 23.35–30.5%, 21.05–53.93% for predicting mean, max dose, V20, V35, and V40 respectively, to OARs in both planning techniques. The prediction errors were generally lower for 4π than VMAT. Conclusion: By employing all dose volume information in the SVDL model, the OAR doses were more accurately predicted. 4π plans are better suited for knowledge-based planning than

  13. Statistical modeling for degradation data

    CERN Document Server

    Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru

    2017-01-01

    This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.

  14. Statistical predictions from anarchic field theory landscapes

    International Nuclear Information System (INIS)

    Balasubramanian, Vijay; Boer, Jan de; Naqvi, Asad

    2010-01-01

    Consistent coupling of effective field theories with a quantum theory of gravity appears to require bounds on the rank of the gauge group and the amount of matter. We consider landscapes of field theories subject to such to boundedness constraints. We argue that appropriately 'coarse-grained' aspects of the randomly chosen field theory in such landscapes, such as the fraction of gauge groups with ranks in a given range, can be statistically predictable. To illustrate our point we show how the uniform measures on simple classes of N=1 quiver gauge theories localize in the vicinity of theories with certain typical structures. Generically, this approach would predict a high energy theory with very many gauge factors, with the high rank factors largely decoupled from the low rank factors if we require asymptotic freedom for the latter.

  15. Statistical modelling with quantile functions

    CERN Document Server

    Gilchrist, Warren

    2000-01-01

    Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...

  16. Comparison on batch anaerobic digestion of five different livestock manures and prediction of biochemical methane potential (BMP) using different statistical models.

    Science.gov (United States)

    Kafle, Gopi Krishna; Chen, Lide

    2016-02-01

    There is a lack of literature reporting the methane potential of several livestock manures under the same anaerobic digestion conditions (same inoculum, temperature, time, and size of the digester). To the best of our knowledge, no previous study has reported biochemical methane potential (BMP) predicting models developed and evaluated by solely using at least five different livestock manure tests results. The goal of this study was to evaluate the BMP of five different livestock manures (dairy manure (DM), horse manure (HM), goat manure (GM), chicken manure (CM) and swine manure (SM)) and to predict the BMP using different statistical models. Nutrients of the digested different manures were also monitored. The BMP tests were conducted under mesophilic temperatures with a manure loading factor of 3.5g volatile solids (VS)/L and a feed to inoculum ratio (F/I) of 0.5. Single variable and multiple variable regression models were developed using manure total carbohydrate (TC), crude protein (CP), total fat (TF), lignin (LIG) and acid detergent fiber (ADF), and measured BMP data. Three different kinetic models (first order kinetic model, modified Gompertz model and Chen and Hashimoto model) were evaluated for BMP predictions. The BMPs of DM, HM, GM, CM and SM were measured to be 204, 155, 159, 259, and 323mL/g VS, respectively and the VS removals were calculated to be 58.6%, 52.9%, 46.4%, 81.4%, 81.4%, respectively. The technical digestion time (T80-90, time required to produce 80-90% of total biogas production) for DM, HM, GM, CM and SM was calculated to be in the ranges of 19-28, 27-37, 31-44, 13-18, 12-17days, respectively. The effluents from the HM showed the lowest nitrogen, phosphorus and potassium concentrations. The effluents from the CM digesters showed highest nitrogen and phosphorus concentrations and digested SM showed highest potassium concentration. Based on the results of the regression analysis, the model using the variable of LIG showed the best (R(2

  17. Statistical Real-time Model for Performance Prediction of Ship Detection from Microsatellite Electro-Optical Imagers

    Directory of Open Access Journals (Sweden)

    Lapierre FabianD

    2010-01-01

    Full Text Available Abstract For locating maritime vessels longer than 45 meters, such vessels are required to set up an Automatic Identification System (AIS used by vessel traffic services. However, when a boat is shutting down its AIS, there are no means to detect it in open sea. In this paper, we use Electro-Optical (EO imagers for noncooperative vessel detection when the AIS is not operational. As compared to radar sensors, EO sensors have lower cost, lower payload, and better computational processing load. EO sensors are mounted on LEO microsatellites. We propose a real-time statistical methodology to estimate sensor Receiver Operating Characteristic (ROC curves. It does not require the computation of the entire image received at the sensor. We then illustrate the use of this methodology to design a simple simulator that can help sensor manufacturers in optimizing the design of EO sensors for maritime applications.

  18. Confidence scores for prediction models

    DEFF Research Database (Denmark)

    Gerds, Thomas Alexander; van de Wiel, MA

    2011-01-01

    In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation,...

  19. A Statistical Programme Assignment Model

    DEFF Research Database (Denmark)

    Rosholm, Michael; Staghøj, Jonas; Svarer, Michael

    When treatment effects of active labour market programmes are heterogeneous in an observable way  across the population, the allocation of the unemployed into different programmes becomes a particularly  important issue. In this paper, we present a statistical model designed to improve the present...... duration of unemployment spells may result if a statistical programme assignment model is introduced. We discuss several issues regarding the  plementation of such a system, especially the interplay between the statistical model and  case workers....

  20. Predicting fracture of mortar beams under three-point bending using non-extensive statistical modeling of electric emissions

    Science.gov (United States)

    Stergiopoulos, Ch.; Stavrakas, I.; Triantis, D.; Vallianatos, F.; Stonham, J.

    2015-02-01

    Weak electric signals termed as 'Pressure Stimulated Currents, PSC' are generated and detected while cement based materials are found under mechanical load, related to the creation of cracks and the consequent evolution of cracks' network in the bulk of the specimen. During the experiment a set of cement mortar beams of rectangular cross-section were subjected to Three-Point Bending (3PB). For each one of the specimens an abrupt mechanical load step was applied, increased from the low load level (Lo) to a high final value (Lh) , where Lh was different for each specimen and it was maintained constant for long time. The temporal behavior of the recorded PSC show that during the load increase a spike-like PSC emission was recorded and consequently a relaxation of the PSC, after reaching its final value, follows. The relaxation process of the PSC was studied using non-extensive statistical physics (NESP) based on Tsallis entropy equation. The behavior of the Tsallis q parameter was studied in relaxation PSCs in order to investigate its potential use as an index for monitoring the crack evolution process with a potential use in non-destructive laboratory testing of cement-based specimens of unknown internal damage level. The dependence of the q-parameter on the Lh (when Lh <0.8Lf), where Lf represents the 3PB strength of the specimen, shows an increase on the q value when the specimens are subjected to gradually higher bending loadings and reaches a maximum value close to 1.4 when the applied Lh becomes higher than 0.8Lf. While the applied Lh becomes higher than 0.9Lf the value of the q-parameter gradually decreases. This analysis of the experimental data manifests that the value of the entropic index q obtains a characteristic decrease while reaching the ultimate strength of the specimen, and thus could be used as a forerunner of the expected failure.

  1. Bootstrap prediction and Bayesian prediction under misspecified models

    OpenAIRE

    Fushiki, Tadayoshi

    2005-01-01

    We consider a statistical prediction problem under misspecified models. In a sense, Bayesian prediction is an optimal prediction method when an assumed model is true. Bootstrap prediction is obtained by applying Breiman's `bagging' method to a plug-in prediction. Bootstrap prediction can be considered to be an approximation to the Bayesian prediction under the assumption that the model is true. However, in applications, there are frequently deviations from the assumed model. In this paper, bo...

  2. Tropical geometry of statistical models.

    Science.gov (United States)

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.

  3. Statistical Models for Social Networks

    NARCIS (Netherlands)

    Snijders, Tom A. B.; Cook, KS; Massey, DS

    2011-01-01

    Statistical models for social networks as dependent variables must represent the typical network dependencies between tie variables such as reciprocity, homophily, transitivity, etc. This review first treats models for single (cross-sectionally observed) networks and then for network dynamics. For

  4. Sensometrics: Thurstonian and Statistical Models

    DEFF Research Database (Denmark)

    Christensen, Rune Haubo Bojesen

    . sensR is a package for sensory discrimination testing with Thurstonian models and ordinal supports analysis of ordinal data with cumulative link (mixed) models. While sensR is closely connected to the sensometrics field, the ordinal package has developed into a generic statistical package applicable......This thesis is concerned with the development and bridging of Thurstonian and statistical models for sensory discrimination testing as applied in the scientific discipline of sensometrics. In sensory discrimination testing sensory differences between products are detected and quantified by the use...... and sensory discrimination testing in particular in a series of papers by advancing Thurstonian models for a range of sensory discrimination protocols in addition to facilitating their application by providing software for fitting these models. The main focus is on identifying Thurstonian models...

  5. Statistical analysis and ANN modeling for predicting hydrological extremes under climate change scenarios: the example of a small Mediterranean agro-watershed.

    Science.gov (United States)

    Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P

    2015-05-01

    The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Statistical prediction of seasonal discharge in Central Asia for water resources management: development of a generic (pre-)operational modeling tool

    Science.gov (United States)

    Apel, Heiko; Baimaganbetov, Azamat; Kalashnikova, Olga; Gavrilenko, Nadejda; Abdykerimova, Zharkinay; Agalhanova, Marina; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Gafurov, Abror

    2017-04-01

    The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt dominated river discharge originating in the mountains provides the main water resource available for agricultural production, but also for storage in reservoirs for energy generation during the winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources. In fact, seasonal forecasts are mandatory tasks of all national hydro-meteorological services in the region. In order to support the operational seasonal forecast procedures of hydromet services, this study aims at the development of a generic tool for deriving statistical forecast models of seasonal river discharge. The generic model is kept as simple as possible in order to be driven by available hydrological and meteorological data, and be applicable for all catchments with their often limited data availability in the region. As snowmelt dominates summer runoff, the main meteorological predictors for the forecast models are monthly values of winter precipitation and temperature as recorded by climatological stations in the catchments. These data sets are accompanied by snow cover predictors derived from the operational ModSnow tool, which provides cloud free snow cover data for the selected catchments based on MODIS satellite images. In addition to the meteorological data antecedent streamflow is used as a predictor variable. This basic predictor set was further extended by multi-monthly means of the individual predictors, as well as composites of the predictors. Forecast models are derived based on these predictors as linear combinations of up to 3 or 4 predictors. A user selectable number of best models according to pre-defined performance criteria is extracted automatically by the developed model fitting algorithm, which includes a test

  7. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. Part 2: Theoretical development of a dynamic model and application to rain fade durations and tolerable control delays for fade countermeasures

    Science.gov (United States)

    Manning, Robert M.

    1987-01-01

    A dynamic rain attenuation prediction model is developed for use in obtaining the temporal characteristics, on time scales of minutes or hours, of satellite communication link availability. Analagous to the associated static rain attenuation model, which yields yearly attenuation predictions, this dynamic model is applicable at any location in the world that is characterized by the static rain attenuation statistics peculiar to the geometry of the satellite link and the rain statistics of the location. Such statistics are calculated by employing the formalism of Part I of this report. In fact, the dynamic model presented here is an extension of the static model and reduces to the static model in the appropriate limit. By assuming that rain attenuation is dynamically described by a first-order stochastic differential equation in time and that this random attenuation process is a Markov process, an expression for the associated transition probability is obtained by solving the related forward Kolmogorov equation. This transition probability is then used to obtain such temporal rain attenuation statistics as attenuation durations and allowable attenuation margins versus control system delay.

  8. Classical model of intermediate statistics

    International Nuclear Information System (INIS)

    Kaniadakis, G.

    1994-01-01

    In this work we present a classical kinetic model of intermediate statistics. In the case of Brownian particles we show that the Fermi-Dirac (FD) and Bose-Einstein (BE) distributions can be obtained, just as the Maxwell-Boltzmann (MD) distribution, as steady states of a classical kinetic equation that intrinsically takes into account an exclusion-inclusion principle. In our model the intermediate statistics are obtained as steady states of a system of coupled nonlinear kinetic equations, where the coupling constants are the transmutational potentials η κκ' . We show that, besides the FD-BE intermediate statistics extensively studied from the quantum point of view, we can also study the MB-FD and MB-BE ones. Moreover, our model allows us to treat the three-state mixing FD-MB-BE intermediate statistics. For boson and fermion mixing in a D-dimensional space, we obtain a family of FD-BE intermediate statistics by varying the transmutational potential η BF . This family contains, as a particular case when η BF =0, the quantum statistics recently proposed by L. Wu, Z. Wu, and J. Sun [Phys. Lett. A 170, 280 (1992)]. When we consider the two-dimensional FD-BE statistics, we derive an analytic expression of the fraction of fermions. When the temperature T→∞, the system is composed by an equal number of bosons and fermions, regardless of the value of η BF . On the contrary, when T=0, η BF becomes important and, according to its value, the system can be completely bosonic or fermionic, or composed both by bosons and fermions

  9. Simple statistical model for branched aggregates

    DEFF Research Database (Denmark)

    Lemarchand, Claire; Hansen, Jesper Schmidt

    2015-01-01

    , given that it already has bonds with others. The model is applied here to asphaltene nanoaggregates observed in molecular dynamics simulations of Cooee bitumen. The variation with temperature of the probabilities deduced from this model is discussed in terms of statistical mechanics arguments....... The relevance of the statistical model in the case of asphaltene nanoaggregates is checked by comparing the predicted value of the probability for one molecule to have exactly i bonds with the same probability directly measured in the molecular dynamics simulations. The agreement is satisfactory......We propose a statistical model that can reproduce the size distribution of any branched aggregate, including amylopectin, dendrimers, molecular clusters of monoalcohols, and asphaltene nanoaggregates. It is based on the conditional probability for one molecule to form a new bond with a molecule...

  10. Textual information access statistical models

    CERN Document Server

    Gaussier, Eric

    2013-01-01

    This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:- information extraction and retrieval;- text classification and clustering;- opinion mining;- comprehension aids (automatic summarization, machine translation, visualization).In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications

  11. Statistical model for predicting correct amount of deoxidizer of Al-killed grade casted at slab continuous caster of Pakistan steel

    International Nuclear Information System (INIS)

    Siddiqui, A.R.; Khan, M.M.A.; Ismail, B.M.

    1999-01-01

    Oxygen is blown in Converter process to oxidize hot metal. This introduces dissolved oxygen in the metal, which may cause embrittlement, voids, inclusion and other undesirable properties in steel. The steel bath at the time of tapping contains 400 to 800 ppm oxygen. Deoxidation is carried out during tapping by adding into the tap ladle appropriate amounts of ferromanganese, ferrosilicon and/or aluminum or other special deoxidizers. In the research aluminum killed grade steel which are casted at the slab caster of Pakistan Steel were investigated. Amount of aluminum added is very critical because if we add lesser amount of aluminum then the required quantity then there will be an incomplete killing of oxygen which results uncleanness in steel. Addition of larger amount of aluminum not only increases the cost of the production but also results as higher amount of alumina, which results in nozzle clogging and increase, loses. The purpose of the research is to develop a statistical model which would predict correct amount of aluminum addition for complete deoxidation of aluminum killed grade casted at slab continuous caster of Pakistan Steel. In the model aluminum added is taken as dependent variable while tapping temperature, turn down carbon composition, turndown manganese composition and oxygen content in steel would be the independent variable. This work is based on operational practice on 130 tons Basic Oxygen furnace. (author)

  12. Improved model for statistical alignment

    Energy Technology Data Exchange (ETDEWEB)

    Miklos, I.; Toroczkai, Z. (Zoltan)

    2001-01-01

    The statistical approach to molecular sequence evolution involves the stochastic modeling of the substitution, insertion and deletion processes. Substitution has been modeled in a reliable way for more than three decades by using finite Markov-processes. Insertion and deletion, however, seem to be more difficult to model, and thc recent approaches cannot acceptably deal with multiple insertions and deletions. A new method based on a generating function approach is introduced to describe the multiple insertion process. The presented algorithm computes the approximate joint probability of two sequences in 0(13) running time where 1 is the geometric mean of the sequence lengths.

  13. A formal statistical approach to representing uncertainty in rainfall-runoff modelling with focus on residual analysis and probabilistic output evaluation - Distinguishing simulation and prediction

    DEFF Research Database (Denmark)

    Breinholt, Anders; Møller, Jan Kloppenborg; Madsen, Henrik

    2012-01-01

    While there seems to be consensus that hydrological model outputs should be accompanied with an uncertainty estimate the appropriate method for uncertainty estimation is not agreed upon and a debate is ongoing between advocators of formal statistical methods who consider errors as stochastic...... and GLUE advocators who consider errors as epistemic, arguing that the basis of formal statistical approaches that requires the residuals to be stationary and conform to a statistical distribution is unrealistic. In this paper we take a formal frequentist approach to parameter estimation and uncertainty...... necessary but the statistical assumptions were nevertheless not 100% justified. The residual analysis showed that significant autocorrelation was present for all simulation models. We believe users of formal approaches to uncertainty evaluation within hydrology and within environmental modelling in general...

  14. Active Learning with Statistical Models.

    Science.gov (United States)

    1995-01-01

    Active Learning with Statistical Models ASC-9217041, NSF CDA-9309300 6. AUTHOR(S) David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan 7. PERFORMING...TERMS 15. NUMBER OF PAGES Al, MIT, Artificial Intelligence, active learning , queries, locally weighted 6 regression, LOESS, mixtures of gaussians...COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1522 January 9. 1995 C.B.C.L. Paper No. 110 Active Learning with

  15. Statistical modeling of geopressured geothermal reservoirs

    Science.gov (United States)

    Ansari, Esmail; Hughes, Richard; White, Christopher D.

    2017-06-01

    Identifying attractive candidate reservoirs for producing geothermal energy requires predictive models. In this work, inspectional analysis and statistical modeling are used to create simple predictive models for a line drive design. Inspectional analysis on the partial differential equations governing this design yields a minimum number of fifteen dimensionless groups required to describe the physics of the system. These dimensionless groups are explained and confirmed using models with similar dimensionless groups but different dimensional parameters. This study models dimensionless production temperature and thermal recovery factor as the responses of a numerical model. These responses are obtained by a Box-Behnken experimental design. An uncertainty plot is used to segment the dimensionless time and develop a model for each segment. The important dimensionless numbers for each segment of the dimensionless time are identified using the Boosting method. These selected numbers are used in the regression models. The developed models are reduced to have a minimum number of predictors and interactions. The reduced final models are then presented and assessed using testing runs. Finally, applications of these models are offered. The presented workflow is generic and can be used to translate the output of a numerical simulator into simple predictive models in other research areas involving numerical simulation.

  16. ARSENIC CONTAMINATION IN GROUNDWATER: A STATISTICAL MODELING

    OpenAIRE

    Palas Roy; Naba Kumar Mondal; Biswajit Das; Kousik Das

    2013-01-01

    High arsenic in natural groundwater in most of the tubewells of the Purbasthali- Block II area of Burdwan district (W.B, India) has recently been focused as a serious environmental concern. This paper is intending to illustrate the statistical modeling of the arsenic contaminated groundwater to identify the interrelation of that arsenic contain with other participating groundwater parameters so that the arsenic contamination level can easily be predicted by analyzing only such parameters. Mul...

  17. Predicting statistical properties of open reading frames in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Katharina Mir

    Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

  18. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    Science.gov (United States)

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be

  19. Statistical prediction of parametric roll using FORM

    DEFF Research Database (Denmark)

    Jensen, Jørgen Juncher; Choi, Ju-hyuck; Nielsen, Ulrik Dam

    2017-01-01

    Previous research has shown that the First Order Reliability Method (FORM) can be an efficient method for estimation of outcrossing rates and extreme value statistics for stationary stochastic processes. This is so also for bifurcation type of processes like parametric roll of ships. The present...

  20. Spatial statistics for predicting flow through a rock fracture

    International Nuclear Information System (INIS)

    Coakley, K.J.

    1989-03-01

    Fluid flow through a single rock fracture depends on the shape of the space between the upper and lower pieces of rock which define the fracture. In this thesis, the normalized flow through a fracture, i.e. the equivalent permeability of a fracture, is predicted in terms of spatial statistics computed from the arrangement of voids, i.e. open spaces, and contact areas within the fracture. Patterns of voids and contact areas, with complexity typical of experimental data, are simulated by clipping a correlated Gaussian process defined on a N by N pixel square region. The voids have constant aperture; the distance between the upper and lower surfaces which define the fracture is either zero or a constant. Local flow is assumed to be proportional to local aperture cubed times local pressure gradient. The flow through a pattern of voids and contact areas is solved using a finite-difference method. After solving for the flow through simulated 10 by 10 by 30 pixel patterns of voids and contact areas, a model to predict equivalent permeability is developed. The first model is for patterns with 80% voids where all voids have the same aperture. The equivalent permeability of a pattern is predicted in terms of spatial statistics computed from the arrangement of voids and contact areas within the pattern. Four spatial statistics are examined. The change point statistic measures how often adjacent pixel alternate from void to contact area (or vice versa ) in the rows of the patterns which are parallel to the overall flow direction. 37 refs., 66 figs., 41 tabs

  1. Time series prediction: statistical and neural techniques

    Science.gov (United States)

    Zahirniak, Daniel R.; DeSimio, Martin P.

    1996-03-01

    In this paper we compare the performance of nonlinear neural network techniques to those of linear filtering techniques in the prediction of time series. Specifically, we compare the results of using the nonlinear systems, known as multilayer perceptron and radial basis function neural networks, with the results obtained using the conventional linear Wiener filter, Kalman filter and Widrow-Hoff adaptive filter in predicting future values of stationary and non- stationary time series. Our results indicate the performance of each type of system is heavily dependent upon the form of the time series being predicted and the size of the system used. In particular, the linear filters perform adequately for linear or near linear processes while the nonlinear systems perform better for nonlinear processes. Since the linear systems take much less time to be developed, they should be tried prior to using the nonlinear systems when the linearity properties of the time series process are unknown.

  2. Statistical models of petrol engines vehicles dynamics

    Science.gov (United States)

    Ilie, C. O.; Marinescu, M.; Alexa, O.; Vilău, R.; Grosu, D.

    2017-10-01

    This paper focuses on studying statistical models of vehicles dynamics. It was design and perform a one year testing program. There were used many same type cars with gasoline engines and different mileage. Experimental data were collected of onboard sensors and those on the engine test stand. A database containing data of 64th tests was created. Several mathematical modelling were developed using database and the system identification method. Each modelling is a SISO or a MISO linear predictive ARMAX (AutoRegressive-Moving-Average with eXogenous inputs) model. It represents a differential equation with constant coefficients. It were made 64th equations for each dependency like engine torque as output and engine’s load and intake manifold pressure, as inputs. There were obtained strings with 64 values for each type of model. The final models were obtained using average values of the coefficients. The accuracy of models was assessed.

  3. Statistical shape and appearance models of bones.

    Science.gov (United States)

    Sarkalkan, Nazli; Weinans, Harrie; Zadpoor, Amir A

    2014-03-01

    When applied to bones, statistical shape models (SSM) and statistical appearance models (SAM) respectively describe the mean shape and mean density distribution of bones within a certain population as well as the main modes of variations of shape and density distribution from their mean values. The availability of this quantitative information regarding the detailed anatomy of bones provides new opportunities for diagnosis, evaluation, and treatment of skeletal diseases. The potential of SSM and SAM has been recently recognized within the bone research community. For example, these models have been applied for studying the effects of bone shape on the etiology of osteoarthritis, improving the accuracy of clinical osteoporotic fracture prediction techniques, design of orthopedic implants, and surgery planning. This paper reviews the main concepts, methods, and applications of SSM and SAM as applied to bone. Copyright © 2013 Elsevier Inc. All rights reserved.

  4. Statistical lung model for microdosimetry

    International Nuclear Information System (INIS)

    Fisher, D.R.; Hadley, R.T.

    1984-03-01

    To calculate the microdosimetry of plutonium in the lung, a mathematical description is needed of lung tissue microstructure that defines source-site parameters. Beagle lungs were expanded using a glutaraldehyde fixative at 30 cm water pressure. Tissue specimens, five microns thick, were stained with hematoxylin and eosin then studied using an image analyzer. Measurements were made along horizontal lines through the magnified tissue image. The distribution of air space and tissue chord lengths and locations of epithelial cell nuclei were recorded from about 10,000 line scans. The distribution parameters constituted a model of lung microstructure for predicting the paths of random alpha particle tracks in the lung and the probability of traversing biologically sensitive sites. This lung model may be used in conjunction with established deposition and retention models for determining the microdosimetry in the pulmonary lung for a wide variety of inhaled radioactive materials

  5. Statistical Prediction of Laminar-turbulent Transition

    National Research Council Canada - National Science Library

    Rubinstein, Robert

    2000-01-01

    ... on representative stability theories including the resonant triad model and the parabolized stability equations. The first type of model can describe the effect of initial phase differences among disturbance modes on transition location...

  6. Predicting binding poses and affinities for protein - ligand complexes in the 2015 D3R Grand Challenge using a physical model with a statistical parameter estimation

    Science.gov (United States)

    Grudinin, Sergei; Kadukova, Maria; Eisenbarth, Andreas; Marillet, Simon; Cazals, Frédéric

    2016-09-01

    The 2015 D3R Grand Challenge provided an opportunity to test our new model for the binding free energy of small molecules, as well as to assess our protocol to predict binding poses for protein-ligand complexes. Our pose predictions were ranked 3-9 for the HSP90 dataset, depending on the assessment metric. For the MAP4K dataset the ranks are very dispersed and equal to 2-35, depending on the assessment metric, which does not provide any insight into the accuracy of the method. The main success of our pose prediction protocol was the re-scoring stage using the recently developed Convex-PL potential. We make a thorough analysis of our docking predictions made with AutoDock Vina and discuss the effect of the choice of rigid receptor templates, the number of flexible residues in the binding pocket, the binding pocket size, and the benefits of re-scoring. However, the main challenge was to predict experimentally determined binding affinities for two blind test sets. Our affinity prediction model consisted of two terms, a pairwise-additive enthalpy, and a non pairwise-additive entropy. We trained the free parameters of the model with a regularized regression using affinity and structural data from the PDBBind database. Our model performed very well on the training set, however, failed on the two test sets. We explain the drawback and pitfalls of our model, in particular in terms of relative coverage of the test set by the training set and missed dynamical properties from crystal structures, and discuss different routes to improve it.

  7. GIGMF - A statistical model program

    International Nuclear Information System (INIS)

    Vladuca, G.; Deberth, C.

    1978-01-01

    The program GIGMF computes the differential and integrated statistical model cross sections for the reactions proceeding through a compound nuclear stage. The computational method is based on the Hauser-Feshbach-Wolfenstein theory, modified to include the modern version of Tepel et al. Although the program was written for a PDP-15 computer, with 16K high speed memory, many reaction channels can be taken into account with the following restrictions: the pro ectile spin must be less than 2, the maximum spin momenta of the compound nucleus can not be greater than 10. These restrictions are due solely to the storage allotments and may be easily relaxed. The energy of the impinging particle, the target and projectile masses, the spin and paritjes of the projectile, target, emergent and residual nuclei the maximum orbital momentum and transmission coefficients for each reaction channel are the input parameters of the program. (author)

  8. Uncertainty propagation for statistical impact prediction of space debris

    Science.gov (United States)

    Hoogendoorn, R.; Mooij, E.; Geul, J.

    2018-01-01

    Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.

  9. Predicting hydrological response to forest changes by simple statistical models: the selection of the best indicator of forest changes with a hydrological perspective

    Science.gov (United States)

    Ning, D.; Zhang, M.; Ren, S.; Hou, Y.; Yu, L.; Meng, Z.

    2017-01-01

    Forest plays an important role in hydrological cycle, and forest changes will inevitably affect runoff across multiple spatial scales. The selection of a suitable indicator for forest changes is essential for predicting forest-related hydrological response. This study used the Meijiang River, one of the headwaters of the Poyang Lake as an example to identify the best indicator of forest changes for predicting forest change-induced hydrological responses. Correlation analysis was conducted first to detect the relationships between monthly runoff and its predictive variables including antecedent monthly precipitation and indicators for forest changes (forest coverage, vegetation indices including EVI, NDVI, and NDWI), and by use of the identified predictive variables that were most correlated with monthly runoff, multiple linear regression models were then developed. The model with best performance identified in this study included two independent variables -antecedent monthly precipitation and NDWI. It indicates that NDWI is the best indicator of forest change in hydrological prediction while forest coverage, the most commonly used indicator of forest change is insignificantly related to monthly runoff. This highlights the use of vegetation index such as NDWI to indicate forest changes in hydrological studies. This study will provide us with an efficient way to quantify the hydrological impact of large-scale forest changes in the Meijiang River watershed, which is crucial for downstream water resource management and ecological protection in the Poyang Lake basin.

  10. Statistical models for prediction of dry weight and nitrogen accumulation based on visible and near-infrared hyper-spectral reflectance of rice canopies

    International Nuclear Information System (INIS)

    Takahashi, W.; Nguyen-Cong, V.; Kawaguchi, S.; Minamiyama, M.; Ninomiya, S.

    2000-01-01

    Various multivariate regression models were examined with ten-fold cross-validation to develop efficient, accurate models to predict dry weight and nitrogen accumulation of rice crops (cultivars Koshihikari, Hanaechizen, Nipponbare, and IR-36) from the maximum tiller number stage to the meiosis stage, using plant-canopy reflectance of hyper-spectra within the 400-1100 nm domain without any variable selection. The results showed that the principal component regression using hyper-spectra gave better fits and predictability than that using specific wavelengths. On the other hand, partial least squares regression was the most useful among the models tested; this method avoided overfitting and multicollinearity by using all wavelength information without variable selection and by inclusion of both x and y variations in its latent variables. (author)

  11. Predictive modeling of complications.

    Science.gov (United States)

    Osorio, Joseph A; Scheer, Justin K; Ames, Christopher P

    2016-09-01

    Predictive analytic algorithms are designed to identify patterns in the data that allow for accurate predictions without the need for a hypothesis. Therefore, predictive modeling can provide detailed and patient-specific information that can be readily applied when discussing the risks of surgery with a patient. There are few studies using predictive modeling techniques in the adult spine surgery literature. These types of studies represent the beginning of the use of predictive analytics in spine surgery outcomes. We will discuss the advancements in the field of spine surgery with respect to predictive analytics, the controversies surrounding the technique, and the future directions.

  12. Prediction and analysis of infra and low-frequency noise of upwind horizontal axis wind turbine using statistical wind speed model

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Gwang-Se; Cheong, Cheolung, E-mail: ccheong@pusan.ac.kr [School of Mechanical Engineering, Pusan National University, Busan, 609-745, Rep. of Korea (Korea, Republic of)

    2014-12-15

    Despite increasing concern about low-frequency noise of modern large horizontal-axis wind turbines (HAWTs), few studies have focused on its origin or its prediction methods. In this paper, infra- and low-frequency (the ILF) wind turbine noise are closely examined and an efficient method is developed for its prediction. Although most previous studies have assumed that the ILF noise consists primarily of blade passing frequency (BPF) noise components, these tonal noise components are seldom identified in the measured noise spectrum, except for the case of downwind wind turbines. In reality, since modern HAWTs are very large, during rotation, a single blade of the turbine experiences inflow with variation in wind speed in time as well as in space, breaking periodic perturbations of the BPF. Consequently, this transforms acoustic contributions at the BPF harmonics into broadband noise components. In this study, the ILF noise of wind turbines is predicted by combining Lowson’s acoustic analogy with the stochastic wind model, which is employed to reproduce realistic wind speed conditions. In order to predict the effects of these wind conditions on pressure variation on the blade surface, unsteadiness in the incident wind speed is incorporated into the XFOIL code by varying incident flow velocities on each blade section, which depend on the azimuthal locations of the rotating blade. The calculated surface pressure distribution is subsequently used to predict acoustic pressure at an observing location by using Lowson’s analogy. These predictions are compared with measured data, which ensures that the present method can reproduce the broadband characteristics of the measured low-frequency noise spectrum. Further investigations are carried out to characterize the IFL noise in terms of pressure loading on blade surface, narrow-band noise spectrum and noise maps around the turbine.

  13. Prediction and analysis of infra and low-frequency noise of upwind horizontal axis wind turbine using statistical wind speed model

    Directory of Open Access Journals (Sweden)

    Gwang-Se Lee

    2014-12-01

    Full Text Available Despite increasing concern about low-frequency noise of modern large horizontal-axis wind turbines (HAWTs, few studies have focused on its origin or its prediction methods. In this paper, infra- and low-frequency (the ILF wind turbine noise are closely examined and an efficient method is developed for its prediction. Although most previous studies have assumed that the ILF noise consists primarily of blade passing frequency (BPF noise components, these tonal noise components are seldom identified in the measured noise spectrum, except for the case of downwind wind turbines. In reality, since modern HAWTs are very large, during rotation, a single blade of the turbine experiences inflow with variation in wind speed in time as well as in space, breaking periodic perturbations of the BPF. Consequently, this transforms acoustic contributions at the BPF harmonics into broadband noise components. In this study, the ILF noise of wind turbines is predicted by combining Lowson’s acoustic analogy with the stochastic wind model, which is employed to reproduce realistic wind speed conditions. In order to predict the effects of these wind conditions on pressure variation on the blade surface, unsteadiness in the incident wind speed is incorporated into the XFOIL code by varying incident flow velocities on each blade section, which depend on the azimuthal locations of the rotating blade. The calculated surface pressure distribution is subsequently used to predict acoustic pressure at an observing location by using Lowson’s analogy. These predictions are compared with measured data, which ensures that the present method can reproduce the broadband characteristics of the measured low-frequency noise spectrum. Further investigations are carried out to characterize the IFL noise in terms of pressure loading on blade surface, narrow-band noise spectrum and noise maps around the turbine.

  14. Statistical modeling of Earth's plasmasphere

    Science.gov (United States)

    Veibell, Victoir

    The behavior of plasma near Earth's geosynchronous orbit is of vital importance to both satellite operators and magnetosphere modelers because it also has a significant influence on energy transport, ion composition, and induced currents. The system is highly complex in both time and space, making the forecasting of extreme space weather events difficult. This dissertation examines the behavior and statistical properties of plasma mass density near geosynchronous orbit by using both linear and nonlinear models, as well as epoch analyses, in an attempt to better understand the physical processes that precipitates and drives its variations. It is shown that while equatorial mass density does vary significantly on an hourly timescale when a drop in the disturbance time scale index ( Dst) was observed, it does not vary significantly between the day of a Dst event onset and the day immediately following. It is also shown that increases in equatorial mass density were not, on average, preceded or followed by any significant change in the examined solar wind or geomagnetic variables, including Dst, despite prior results that considered a few selected events and found a notable influence. It is verified that equatorial mass density and and solar activity via the F10.7 index have a strong correlation, which is stronger over longer timescales such as 27 days than it is over an hourly timescale. It is then shown that this connection seems to affect the behavior of equatorial mass density most during periods of strong solar activity leading to large mass density reactions to Dst drops for high values of F10.7. It is also shown that equatorial mass density behaves differently before and after events based on the value of F10.7 at the onset of an equatorial mass density event or a Dst event, and that a southward interplanetary magnetic field at onset leads to slowed mass density growth after event onset. These behavioral differences provide insight into how solar and geomagnetic

  15. Using machine learning, neural networks and statistics to predict bankruptcy

    NARCIS (Netherlands)

    Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.

    1997-01-01

    Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear

  16. ARSENIC CONTAMINATION IN GROUNDWATER: A STATISTICAL MODELING

    Directory of Open Access Journals (Sweden)

    Palas Roy

    2013-01-01

    Full Text Available High arsenic in natural groundwater in most of the tubewells of the Purbasthali- Block II area of Burdwan district (W.B, India has recently been focused as a serious environmental concern. This paper is intending to illustrate the statistical modeling of the arsenic contaminated groundwater to identify the interrelation of that arsenic contain with other participating groundwater parameters so that the arsenic contamination level can easily be predicted by analyzing only such parameters. Multivariate data analysis was done with the collected groundwater samples from the 132 tubewells of this contaminated region shows that three variable parameters are significantly related with the arsenic. Based on these relationships, a multiple linear regression model has been developed that estimated the arsenic contamination by measuring such three predictor parameters of the groundwater variables in the contaminated aquifer. This model could also be a suggestive tool while designing the arsenic removal scheme for any affected groundwater.

  17. Probing NWP model deficiencies by statistical postprocessing

    DEFF Research Database (Denmark)

    Rosgaard, Martin Haubjerg; Nielsen, Henrik Aalborg; Nielsen, Torben S.

    2016-01-01

    The objective in this article is twofold. On one hand, a Model Output Statistics (MOS) framework for improved wind speed forecast accuracy is described and evaluated. On the other hand, the approach explored identifies unintuitive explanatory value from a diagnostic variable in an operational....... Based on the statistical model candidates inferred from the data, the lifted index NWP model diagnostic is consistently found among the NWP model predictors of the best performing statistical models across sites....

  18. A New Structure-Activity Relationship (SAR) Model for Predicting Drug-Induced Liver Injury, Based on Statistical and Expert-Based Structural Alerts

    Science.gov (United States)

    Pizzo, Fabiola; Lombardo, Anna; Manganaro, Alberto; Benfenati, Emilio

    2016-01-01

    The prompt identification of chemical molecules with potential effects on liver may help in drug discovery and in raising the levels of protection for human health. Besides in vitro approaches, computational methods in toxicology are drawing attention. We built a structure-activity relationship (SAR) model for evaluating hepatotoxicity. After compiling a data set of 950 compounds using data from the literature, we randomly split it into training (80%) and test sets (20%). We also compiled an external validation set (101 compounds) for evaluating the performance of the model. To extract structural alerts (SAs) related to hepatotoxicity and non-hepatotoxicity we used SARpy, a statistical application that automatically identifies and extracts chemical fragments related to a specific activity. We also applied the chemical grouping approach for manually identifying other SAs. We calculated accuracy, specificity, sensitivity and Matthews correlation coefficient (MCC) on the training, test and external validation sets. Considering the complexity of the endpoint, the model performed well. In the training, test and external validation sets the accuracy was respectively 81, 63, and 68%, specificity 89, 33, and 33%, sensitivity 93, 88, and 80% and MCC 0.63, 0.27, and 0.13. Since it is preferable to overestimate hepatotoxicity rather than not to recognize unsafe compounds, the model's architecture followed a conservative approach. As it was built using human data, it might be applied without any need for extrapolation from other species. This model will be freely available in the VEGA platform. PMID:27920722

  19. Modelling bankruptcy prediction models in Slovak companies

    Directory of Open Access Journals (Sweden)

    Kovacova Maria

    2017-01-01

    Full Text Available An intensive research from academics and practitioners has been provided regarding models for bankruptcy prediction and credit risk management. In spite of numerous researches focusing on forecasting bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression and early artificial intelligence models (e.g. artificial neural networks, there is a trend for transition to machine learning models (support vector machines, bagging, boosting, and random forest to predict bankruptcy one year prior to the event. Comparing the performance of this with unconventional approach with results obtained by discriminant analysis, logistic regression, and neural networks application, it has been found that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. On the other side the prediction accuracy of old and well known bankruptcy prediction models is quiet high. Therefore, we aim to analyse these in some way old models on the dataset of Slovak companies to validate their prediction ability in specific conditions. Furthermore, these models will be modelled according to new trends by calculating the influence of elimination of selected variables on the overall prediction ability of these models.

  20. Wind speed prediction using statistical regression and neural network

    Indian Academy of Sciences (India)

    Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve fitting,Auto Regressive ...

  1. Statistical characterization of pitting corrosion process and life prediction

    International Nuclear Information System (INIS)

    Sheikh, A.K.; Younas, M.

    1995-01-01

    In order to prevent corrosion failures of machines and structures, it is desirable to know in advance when the corrosion damage will take place, and appropriate measures are needed to mitigate the damage. The corrosion predictions are needed both at development as well as operational stage of machines and structures. There are several forms of corrosion process through which varying degrees of damage can occur. Under certain conditions these corrosion processes at alone and in other set of conditions, several of these processes may occur simultaneously. For a certain type of machine elements and structures, such as gears, bearing, tubes, pipelines, containers, storage tanks etc., are particularly prone to pitting corrosion which is an insidious form of corrosion. The corrosion predictions are usually based on experimental results obtained from test coupons and/or field experiences of similar machines or parts of a structure. Considerable scatter is observed in corrosion processes. The probabilities nature and kinetics of pitting process makes in necessary to use statistical method to forecast the residual life of machine of structures. The focus of this paper is to characterization pitting as a time-dependent random process, and using this characterization the prediction of life to reach a critical level of pitting damage can be made. Using several data sets from literature on pitting corrosion, the extreme value modeling of pitting corrosion process, the evolution of the extreme value distribution in time, and their relationship to the reliability of machines and structure are explained. (author)

  2. Archaeological predictive model set.

    Science.gov (United States)

    2015-03-01

    This report is the documentation for Task 7 of the Statewide Archaeological Predictive Model Set. The goal of this project is to : develop a set of statewide predictive models to assist the planning of transportation projects. PennDOT is developing t...

  3. A new structure-activity relationship (SAR model for predicting drug-induced liver injury, based on statistical and expert-basedstructural alerts.

    Directory of Open Access Journals (Sweden)

    Fabiola Pizzo

    2016-11-01

    Full Text Available The prompt identification of chemical molecules with potential effects on liver may help in drug discovery and in raising the levels of protection for human health. Besides in vitro approaches, computational methods in toxicology are drawing attention. We built a structure-activity relationship (SAR model for evaluating hepatotoxicity. After compiling a data set of 950 compounds using data from the literature, we randomly split it into training (80% and test sets (20%. We also compiled an external validation set (101 compounds for evaluating the performance of the model. To extract structural alerts (SAs related to hepatotoxicity and non-hepatotoxicity we used SARpy, a statistical application that automatically identifies and extracts chemical fragments related to a specific activity. We also applied the chemical grouping approach for manually identifying other SAs. We calculated accuracy, specificity, sensitivity and Matthews correlation coefficient (MCC on the training, test and external validation sets. Considering the complexity of the endpoint, the model performed well. In the training, test and external validation sets the accuracy was respectively 81%, 63% and 68%, specificity 89%, 33% and 33%, sensitivity 93%, 88% and 80% and MCC 0.63, 0.27 and 0.13. Since it is preferable to overestimate hepatotoxicity rather than not to recognize unsafe compounds, the model’s architecture followed a conservative approach. As it was built using human data, it might be applied without any need for extrapolation from other species. This model will be freely available in the VEGA platform.

  4. Surface drift prediction in the Adriatic Sea using hyper-ensemble statistics on atmospheric, ocean and wave models: Uncertainties and probability distribution areas

    Science.gov (United States)

    Rixen, M.; Ferreira-Coelho, E.; Signell, R.

    2008-01-01

    Despite numerous and regular improvements in underlying models, surface drift prediction in the ocean remains a challenging task because of our yet limited understanding of all processes involved. Hence, deterministic approaches to the problem are often limited by empirical assumptions on underlying physics. Multi-model hyper-ensemble forecasts, which exploit the power of an optimal local combination of available information including ocean, atmospheric and wave models, may show superior forecasting skills when compared to individual models because they allow for local correction and/or bias removal. In this work, we explore in greater detail the potential and limitations of the hyper-ensemble method in the Adriatic Sea, using a comprehensive surface drifter database. The performance of the hyper-ensembles and the individual models are discussed by analyzing associated uncertainties and probability distribution maps. Results suggest that the stochastic method may reduce position errors significantly for 12 to 72??h forecasts and hence compete with pure deterministic approaches. ?? 2007 NATO Undersea Research Centre (NURC).

  5. Statistical prediction of nanoparticle delivery: from culture media to cell

    Science.gov (United States)

    Rowan Brown, M.; Hondow, Nicole; Brydson, Rik; Rees, Paul; Brown, Andrew P.; Summers, Huw D.

    2015-04-01

    The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for

  6. Statistical prediction of nanoparticle delivery: from culture media to cell

    International Nuclear Information System (INIS)

    Brown, M Rowan; Rees, Paul; Summers, Huw D; Hondow, Nicole; Brydson, Rik; Brown, Andrew P

    2015-01-01

    The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for

  7. The accuracy of the SONOBREAST statistical model in comparison to BI-RADS for the prediction of malignancy in solid breast nodules detected at ultrasonography.

    Science.gov (United States)

    Paulinelli, Regis R; Oliveira, Luis-Fernando P; Freitas-Junior, Ruffo; Soares, Leonardo R

    2016-01-01

    The objective of the present study was to compare the accuracy of SONOBREAST for the prediction of malignancy in solid breast nodules detected at ultrasonography with that of the BI-RADS system and to assess the agreement between these two methods. This prospective study included 274 women and evaluated 500 breast nodules detected at ultrasonography. The probability of malignancy was calculated based on the SONOBREAST model, available at www.sonobreast.com.br, and on the BI-RADS system, with results being compared with the anatomopathology report. The lesions were considered suspect in 171 cases (34.20%), according to both SONOBREAST and BI-RADS. Agreement between the methods was perfect, as shown by a Kappa coefficient of 1 (pBI-RADS proved identical insofar as sensitivity (95.40%), specificity (78.69%), positive predictive value (48.54%), negative predictive value (98.78%) and accuracy (81.60%) are concerned. With respect to the categorical variables (BI-RADS categories 3, 4 and 5), the area under the receiver operating characteristic (ROC) curve was 94.41 for SONOBREAST (range 92.20-96.62) and 89.99 for BI-RADS (range 86.60-93.37). The accuracy of the SONOBREAST model is identical to that found with BI-RADS when the same parameters are used with respect to the cut-off point at which malignancy is suspected. Regarding the continuous probability of malignancy with BI-RADS categories 3, 4 and 5, SONOBREAST permits a more precise and individualized evaluation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. The prediction of epidemics through mathematical modeling.

    Science.gov (United States)

    Schaus, Catherine

    2014-01-01

    Mathematical models may be resorted to in an endeavor to predict the development of epidemics. The SIR model is one of the applications. Still too approximate, the use of statistics awaits more data in order to come closer to reality.

  9. Statistical modeling to support power system planning

    Science.gov (United States)

    Staid, Andrea

    This dissertation focuses on data-analytic approaches that improve our understanding of power system applications to promote better decision-making. It tackles issues of risk analysis, uncertainty management, resource estimation, and the impacts of climate change. Tools of data mining and statistical modeling are used to bring new insight to a variety of complex problems facing today's power system. The overarching goal of this research is to improve the understanding of the power system risk environment for improved operation, investment, and planning decisions. The first chapter introduces some challenges faced in planning for a sustainable power system. Chapter 2 analyzes the driving factors behind the disparity in wind energy investments among states with a goal of determining the impact that state-level policies have on incentivizing wind energy. Findings show that policy differences do not explain the disparities; physical and geographical factors are more important. Chapter 3 extends conventional wind forecasting to a risk-based focus of predicting maximum wind speeds, which are dangerous for offshore operations. Statistical models are presented that issue probabilistic predictions for the highest wind speed expected in a three-hour interval. These models achieve a high degree of accuracy and their use can improve safety and reliability in practice. Chapter 4 examines the challenges of wind power estimation for onshore wind farms. Several methods for wind power resource assessment are compared, and the weaknesses of the Jensen model are demonstrated. For two onshore farms, statistical models outperform other methods, even when very little information is known about the wind farm. Lastly, chapter 5 focuses on the power system more broadly in the context of the risks expected from tropical cyclones in a changing climate. Risks to U.S. power system infrastructure are simulated under different scenarios of tropical cyclone behavior that may result from climate

  10. Fatigue crack initiation and growth life prediction with statistical consideration

    International Nuclear Information System (INIS)

    Kwon, J.D.; Choi, S.H.; Kwak, S.G.; Chun, K.O.

    1991-01-01

    Life prediction or residual life prediction of structures or machines is one of the most strongly world wide needed problems as requirement in the stage of slowly developing economy which comes after rapidly and highly developing stage. For the purpose of statistical life prediction, fatigue test was conducted under the 3 stress levels, and for each stress level, 20 specimens are used. The statistical properties of the crack growth parameter m and C in the fatigue crack growth law of da/dN = C(ΔK) m , and the relationship between m and C, and the statistical distribution pattern of fatigue crack initiation, growth and fracture lives can be obtained by experimental results

  11. Atmospheric corrosion: statistical validation of models

    International Nuclear Information System (INIS)

    Diaz, V.; Martinez-Luaces, V.; Guineo-Cobs, G.

    2003-01-01

    In this paper we discuss two different methods for validation of regression models, applied to corrosion data. One of them is based on the correlation coefficient and the other one is the statistical test of lack of fit. Both methods are used here to analyse fitting of bi logarithmic model in order to predict corrosion for very low carbon steel substrates in rural and urban-industrial atmospheres in Uruguay. Results for parameters A and n of the bi logarithmic model are reported here. For this purpose, all repeated values were used instead of using average values as usual. Modelling is carried out using experimental data corresponding to steel substrates under the same initial meteorological conditions ( in fact, they are put in the rack at the same time). Results of correlation coefficient are compared with the lack of it tested at two different signification levels (α=0.01 and α=0.05). Unexpected differences between them are explained and finally, it is possible to conclude, at least in the studied atmospheres, that the bi logarithmic model does not fit properly the experimental data. (Author) 18 refs

  12. Learning predictive statistics from temporal sequences: Dynamics and strategies.

    Science.gov (United States)

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-10-01

    Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

  13. Inverse and Predictive Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Syracuse, Ellen Marie [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-09-27

    The LANL Seismo-Acoustic team has a strong capability in developing data-driven models that accurately predict a variety of observations. These models range from the simple – one-dimensional models that are constrained by a single dataset and can be used for quick and efficient predictions – to the complex – multidimensional models that are constrained by several types of data and result in more accurate predictions. Team members typically build models of geophysical characteristics of Earth and source distributions at scales of 1 to 1000s of km, the techniques used are applicable for other types of physical characteristics at an even greater range of scales. The following cases provide a snapshot of some of the modeling work done by the Seismo- Acoustic team at LANL.

  14. Statistical modelling of fish stocks

    DEFF Research Database (Denmark)

    Kvist, Trine

    1999-01-01

    for modelling the dynamics of a fish population is suggested. A new approach is introduced to analyse the sources of variation in age composition data, which is one of the most important sources of information in the cohort based models for estimation of stock abundancies and mortalities. The approach combines...... and it is argued that an approach utilising stochastic differential equations might be advantagous in fish stoch assessments....

  15. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    International Nuclear Information System (INIS)

    Yahya, Noorazrul; Ebert, Martin A.; Bulsara, Max; House, Michael J.; Kennedy, Angel; Joseph, David J.; Denham, James W.

    2016-01-01

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions

  16. Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods

    Energy Technology Data Exchange (ETDEWEB)

    Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au [School of Physics, University of Western Australia, Western Australia 6009, Australia and School of Health Sciences, National University of Malaysia, Bangi 43600 (Malaysia); Ebert, Martin A. [School of Physics, University of Western Australia, Western Australia 6009, Australia and Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Bulsara, Max [Institute for Health Research, University of Notre Dame, Fremantle, Western Australia 6959 (Australia); House, Michael J. [School of Physics, University of Western Australia, Western Australia 6009 (Australia); Kennedy, Angel [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008 (Australia); Joseph, David J. [Department of Radiation Oncology, Sir Charles Gairdner Hospital, Western Australia 6008, Australia and School of Surgery, University of Western Australia, Western Australia 6009 (Australia); Denham, James W. [School of Medicine and Public Health, University of Newcastle, New South Wales 2308 (Australia)

    2016-05-15

    Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥ 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions

  17. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 3: A stochastic rain fade control algorithm for satellite link power via non linear Markow filtering theory

    Science.gov (United States)

    Manning, Robert M.

    1991-01-01

    The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.

  18. Spherical Process Models for Global Spatial Statistics

    KAUST Repository

    Jeong, Jaehong

    2017-11-28

    Statistical models used in geophysical, environmental, and climate science applications must reflect the curvature of the spatial domain in global data. Over the past few decades, statisticians have developed covariance models that capture the spatial and temporal behavior of these global data sets. Though the geodesic distance is the most natural metric for measuring distance on the surface of a sphere, mathematical limitations have compelled statisticians to use the chordal distance to compute the covariance matrix in many applications instead, which may cause physically unrealistic distortions. Therefore, covariance functions directly defined on a sphere using the geodesic distance are needed. We discuss the issues that arise when dealing with spherical data sets on a global scale and provide references to recent literature. We review the current approaches to building process models on spheres, including the differential operator, the stochastic partial differential equation, the kernel convolution, and the deformation approaches. We illustrate realizations obtained from Gaussian processes with different covariance structures and the use of isotropic and nonstationary covariance models through deformations and geographical indicators for global surface temperature data. To assess the suitability of each method, we compare their log-likelihood values and prediction scores, and we end with a discussion of related research problems.

  19. Statistical modelling for ship propulsion efficiency

    DEFF Research Database (Denmark)

    Petersen, Jóan Petur; Jacobsen, Daniel J.; Winther, Ole

    2012-01-01

    This paper presents a state-of-the-art systems approach to statistical modelling of fuel efficiency in ship propulsion, and also a novel and publicly available data set of high quality sensory data. Two statistical model approaches are investigated and compared: artificial neural networks...

  20. Actuarial statistics with generalized linear mixed models

    NARCIS (Netherlands)

    Antonio, K.; Beirlant, J.

    2007-01-01

    Over the last decade the use of generalized linear models (GLMs) in actuarial statistics has received a lot of attention, starting from the actuarial illustrations in the standard text by McCullagh and Nelder [McCullagh, P., Nelder, J.A., 1989. Generalized linear models. In: Monographs on Statistics

  1. Spherical Process Models for Global Spatial Statistics

    KAUST Repository

    Jeong, Jaehong; Jun, Mikyoung; Genton, Marc G.

    2017-01-01

    Statistical models used in geophysical, environmental, and climate science applications must reflect the curvature of the spatial domain in global data. Over the past few decades, statisticians have developed covariance models that capture

  2. Statistical timing for parametric yield prediction of digital integrated circuits

    NARCIS (Netherlands)

    Jess, J.A.G.; Kalafala, K.; Naidu, S.R.; Otten, R.H.J.M.; Visweswariah, C.

    2006-01-01

    Uncertainty in circuit performance due to manufacturing and environmental variations is increasing with each new generation of technology. It is therefore important to predict the performance of a chip as a probabilistic quantity. This paper proposes three novel path-based algorithms for statistical

  3. Nonparametric Bayesian predictive distributions for future order statistics

    Science.gov (United States)

    Richard A. Johnson; James W. Evans; David W. Green

    1999-01-01

    We derive the predictive distribution for a specified order statistic, determined from a future random sample, under a Dirichlet process prior. Two variants of the approach are treated and some limiting cases studied. A practical application to monitoring the strength of lumber is discussed including choices of prior expectation and comparisons made to a Bayesian...

  4. Statistical Models and Methods for Lifetime Data

    CERN Document Server

    Lawless, Jerald F

    2011-01-01

    Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,

  5. Statistics and the shell model

    International Nuclear Information System (INIS)

    Weidenmueller, H.A.

    1985-01-01

    Starting with N. Bohr's paper on compound-nucleus reactions, we confront regular dynamical features and chaotic motion in nuclei. The shell-model and, more generally, mean-field theories describe average nuclear properties which are thus identified as regular features. The fluctuations about the average show chaotic behaviour of the same type as found in classical chaotic systems upon quantisation. These features are therefore generic and quite independent of the specific dynamics of the nucleus. A novel method to calculate fluctuations is discussed, and the results of this method are described. (orig.)

  6. Melanoma Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing melanoma cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  7. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    Science.gov (United States)

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  8. A statistical study of the performance of the Hakamada-Akasofu-Fry version 2 numerical model in predicting solar shock arrival times at Earth during different phases of solar cycle 23

    Energy Technology Data Exchange (ETDEWEB)

    McKenna-Lawlor, S.M.P. [National Univ. of Ireland, Maynooth, Co. Kildare (Ireland). Space Technology Ireland; Fry, C.D. [Exploration Physics International, Inc., Huntsville, AL (United States); Dryer, M. [Exploration Physics International, Inc., Huntsville, AL (United States); NOAA Space Environment Center, Boulder, CO (United States); Heynderickx, D. [D-H Consultancy, Leuven (Belgium); Kecskemety, K. [KFKI Research Institute for Particle and Nuclear Physics, Budapest (Hungary); Kudela, K. [Institute of Experimental Physics, Kosice (Slovakia); Balaz, J. [National Univ. of Ireland, Maynooth, Co. Kildare (Ireland). Space Technology Ireland; Institute of Experimental Physics, Kosice (Slovakia)

    2012-07-01

    The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2) numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay) of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events) associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50 %). This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50 %, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter ''Probability of Detection, yes'' (PODy) which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed), yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The

  9. A statistical study of the performance of the Hakamada-Akasofu-Fry version 2 numerical model in predicting solar shock arrival times at Earth during different phases of solar cycle 23

    Directory of Open Access Journals (Sweden)

    S. M. P. McKenna-Lawlor

    2012-02-01

    Full Text Available The performance of the Hakamada Akasofu-Fry, version 2 (HAFv.2 numerical model, which provides predictions of solar shock arrival times at Earth, was subjected to a statistical study to investigate those solar/interplanetary circumstances under which the model performed well/poorly during key phases (rise/maximum/decay of solar cycle 23. In addition to analyzing elements of the overall data set (584 selected events associated with particular cycle phases, subsets were formed such that those events making up a particular sub-set showed common characteristics. The statistical significance of the results obtained using the various sets/subsets was generally very low and these results were not significant as compared with the hit by chance rate (50%. This implies a low level of confidence in the predictions of the model with no compelling result encouraging its use. However, the data suggested that the success rates of HAFv.2 were higher when the background solar wind speed at the time of shock initiation was relatively fast. Thus, in scenarios where the background solar wind speed is elevated and the calculated success rate significantly exceeds the rate by chance, the forecasts could provide potential value to the customer. With the composite statistics available for solar cycle 23, the calculated success rate at high solar wind speed, although clearly above 50%, was indicative rather than conclusive. The RMS error estimated for shock arrival times for every cycle phase and for the composite sample was in each case significantly better than would be expected for a random data set. Also, the parameter "Probability of Detection, yes" (PODy which presents the Proportion of Yes observations that were correctly forecast (i.e. the ratio between the shocks correctly predicted and all the shocks observed, yielded values for the rise/maximum/decay phases of the cycle and using the composite sample of 0.85, 0.64, 0.79 and 0.77, respectively. The statistical

  10. Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes

    Science.gov (United States)

    Williams Colin P.

    1999-01-01

    Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.

  11. Bayesian models: A statistical primer for ecologists

    Science.gov (United States)

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  12. Modelling diversity in building occupant behaviour: a novel statistical approach

    DEFF Research Database (Denmark)

    Haldi, Frédéric; Calì, Davide; Andersen, Rune Korsholm

    2016-01-01

    We propose an advanced modelling framework to predict the scope and effects of behavioural diversity regarding building occupant actions on window openings, shading devices and lighting. We develop a statistical approach based on generalised linear mixed models to account for the longitudinal nat...

  13. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    Science.gov (United States)

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix

  14. Statistical Model-Based Face Pose Estimation

    Institute of Scientific and Technical Information of China (English)

    GE Xinliang; YANG Jie; LI Feng; WANG Huahua

    2007-01-01

    A robust face pose estimation approach is proposed by using face shape statistical model approach and pose parameters are represented by trigonometric functions. The face shape statistical model is firstly built by analyzing the face shapes from different people under varying poses. The shape alignment is vital in the process of building the statistical model. Then, six trigonometric functions are employed to represent the face pose parameters. Lastly, the mapping function is constructed between face image and face pose by linearly relating different parameters. The proposed approach is able to estimate different face poses using a few face training samples. Experimental results are provided to demonstrate its efficiency and accuracy.

  15. Uncertainty the soul of modeling, probability & statistics

    CERN Document Server

    Briggs, William

    2016-01-01

    This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science. The ultimate goal is to call into question many standard tenets and lay the philosophical and probabilistic groundwork and infrastructure for statistical modeling. It is the first book devoted to the philosophy of data aimed at working scientists and calls for a new consideration in the practice of probability and statistics to eliminate what has been referred to as the "Cult of Statistical Significance". The book explains the philosophy of these ideas and not the mathematics, though there are a handful of mathematical examples. The topics are logically laid out, starting with basic philosophy as related to probability, statistics, and science, and stepping through the key probabilistic ideas and concepts, and ending with statistical models. Its jargon-free approach asserts that standard methods, suc...

  16. Automated statistical modeling of analytical measurement systems

    International Nuclear Information System (INIS)

    Jacobson, J.J.

    1992-01-01

    The statistical modeling of analytical measurement systems at the Idaho Chemical Processing Plant (ICPP) has been completely automated through computer software. The statistical modeling of analytical measurement systems is one part of a complete quality control program used by the Remote Analytical Laboratory (RAL) at the ICPP. The quality control program is an integration of automated data input, measurement system calibration, database management, and statistical process control. The quality control program and statistical modeling program meet the guidelines set forth by the American Society for Testing Materials and American National Standards Institute. A statistical model is a set of mathematical equations describing any systematic bias inherent in a measurement system and the precision of a measurement system. A statistical model is developed from data generated from the analysis of control standards. Control standards are samples which are made up at precise known levels by an independent laboratory and submitted to the RAL. The RAL analysts who process control standards do not know the values of those control standards. The object behind statistical modeling is to describe real process samples in terms of their bias and precision and, to verify that a measurement system is operating satisfactorily. The processing of control standards gives us this ability

  17. Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation.

    Science.gov (United States)

    Pearce, Marcus T

    2018-05-11

    Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.

  18. On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology.

    Directory of Open Access Journals (Sweden)

    Paul B Conn

    Full Text Available Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook's notion of an independent variable hull (IVH, developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models.

  19. Topology for statistical modeling of petascale data.

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)

    2011-07-01

    This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.

  20. Statistical modelling of citation exchange between statistics journals.

    Science.gov (United States)

    Varin, Cristiano; Cattelan, Manuela; Firth, David

    2016-01-01

    Rankings of scholarly journals based on citation data are often met with scepticism by the scientific community. Part of the scepticism is due to disparity between the common perception of journals' prestige and their ranking based on citation counts. A more serious concern is the inappropriate use of journal rankings to evaluate the scientific influence of researchers. The paper focuses on analysis of the table of cross-citations among a selection of statistics journals. Data are collected from the Web of Science database published by Thomson Reuters. Our results suggest that modelling the exchange of citations between journals is useful to highlight the most prestigious journals, but also that journal citation data are characterized by considerable heterogeneity, which needs to be properly summarized. Inferential conclusions require care to avoid potential overinterpretation of insignificant differences between journal ratings. Comparison with published ratings of institutions from the UK's research assessment exercise shows strong correlation at aggregate level between assessed research quality and journal citation 'export scores' within the discipline of statistics.

  1. Mixed deterministic statistical modelling of regional ozone air pollution

    KAUST Repository

    Kalenderski, Stoitchko

    2011-03-17

    We develop a physically motivated statistical model for regional ozone air pollution by separating the ground-level pollutant concentration field into three components, namely: transport, local production and large-scale mean trend mostly dominated by emission rates. The model is novel in the field of environmental spatial statistics in that it is a combined deterministic-statistical model, which gives a new perspective to the modelling of air pollution. The model is presented in a Bayesian hierarchical formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate that the model vastly outperforms existing, simpler modelling approaches. Our study highlights the importance of simultaneously considering different aspects of an air pollution problem as well as taking into account the physical bases that govern the processes of interest. © 2011 John Wiley & Sons, Ltd..

  2. Daily precipitation statistics in regional climate models

    DEFF Research Database (Denmark)

    Frei, Christoph; Christensen, Jens Hesselbjerg; Déqué, Michel

    2003-01-01

    An evaluation is undertaken of the statistics of daily precipitation as simulated by five regional climate models using comprehensive observations in the region of the European Alps. Four limited area models and one variable-resolution global model are considered, all with a grid spacing of 50 km...

  3. Infinite Random Graphs as Statistical Mechanical Models

    DEFF Research Database (Denmark)

    Durhuus, Bergfinnur Jøgvan; Napolitano, George Maria

    2011-01-01

    We discuss two examples of infinite random graphs obtained as limits of finite statistical mechanical systems: a model of two-dimensional dis-cretized quantum gravity defined in terms of causal triangulated surfaces, and the Ising model on generic random trees. For the former model we describe a ...

  4. Statistical modelling of monthly mean sea level at coastal tide gauge stations along the Indian subcontinent

    Digital Repository Service at National Institute of Oceanography (India)

    Srinivas, K.; Das, V.K.; DineshKumar, P.K.

    This study investigates the suitability of statistical models for their predictive potential for the monthly mean sea level at different stations along the west and east coasts of the Indian subcontinent. Statistical modelling of the monthly mean...

  5. Matrix Tricks for Linear Statistical Models

    CERN Document Server

    Puntanen, Simo; Styan, George PH

    2011-01-01

    In teaching linear statistical models to first-year graduate students or to final-year undergraduate students there is no way to proceed smoothly without matrices and related concepts of linear algebra; their use is really essential. Our experience is that making some particular matrix tricks very familiar to students can substantially increase their insight into linear statistical models (and also multivariate statistical analysis). In matrix algebra, there are handy, sometimes even very simple "tricks" which simplify and clarify the treatment of a problem - both for the student and

  6. Statistical physics of pairwise probability models

    DEFF Research Database (Denmark)

    Roudi, Yasser; Aurell, Erik; Hertz, John

    2009-01-01

    (dansk abstrakt findes ikke) Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of  data......: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying...

  7. Distributions with given marginals and statistical modelling

    CERN Document Server

    Fortiana, Josep; Rodriguez-Lallena, José

    2002-01-01

    This book contains a selection of the papers presented at the meeting `Distributions with given marginals and statistical modelling', held in Barcelona (Spain), July 17-20, 2000. In 24 chapters, this book covers topics such as the theory of copulas and quasi-copulas, the theory and compatibility of distributions, models for survival distributions and other well-known distributions, time series, categorical models, definition and estimation of measures of dependence, monotonicity and stochastic ordering, shape and separability of distributions, hidden truncation models, diagonal families, orthogonal expansions, tests of independence, and goodness of fit assessment. These topics share the use and properties of distributions with given marginals, this being the fourth specialised text on this theme. The innovative aspect of the book is the inclusion of statistical aspects such as modelling, Bayesian statistics, estimation, and tests.

  8. Aspects of statistical model for multifragmentation

    International Nuclear Information System (INIS)

    Bhattacharyya, P.; Das Gupta, S.; Mekjian, A. Z.

    1999-01-01

    We deal with two different aspects of an exactly soluble statistical model of fragmentation. First we show, using zero range force and finite temperature Thomas-Fermi theory, that a common link can be found between finite temperature mean field theory and the statistical fragmentation model. We show the latter naturally arises in the spinodal region. Next we show that although the exact statistical model is a canonical model and uses temperature, microcanonical results which use constant energy rather than constant temperature can also be obtained from the canonical model using saddle-point approximation. The methodology is extremely simple to implement and at least in all the examples studied in this work is very accurate. (c) 1999 The American Physical Society

  9. Statistical prediction of biomethane potentials based on the composition of lignocellulosic biomass

    DEFF Research Database (Denmark)

    Thomsen, Sune Tjalfe; Spliid, Henrik; Østergård, Hanne

    2014-01-01

    Mixture models are introduced as a new and stronger methodology for statistical prediction of biomethane potentials (BPM) from lignocellulosic biomass compared to the linear regression models previously used. A large dataset from literature combined with our own data were analysed using canonical...

  10. Statistical Compression for Climate Model Output

    Science.gov (United States)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  11. Statistical tests for equal predictive ability across multiple forecasting methods

    DEFF Research Database (Denmark)

    Borup, Daniel; Thyrsgaard, Martin

    We develop a multivariate generalization of the Giacomini-White tests for equal conditional predictive ability. The tests are applicable to a mixture of nested and non-nested models, incorporate estimation uncertainty explicitly, and allow for misspecification of the forecasting model as well as ...

  12. Cultural Resource Predictive Modeling

    Science.gov (United States)

    2017-10-01

    CR cultural resource CRM cultural resource management CRPM Cultural Resource Predictive Modeling DoD Department of Defense ESTCP Environmental...resource management ( CRM ) legal obligations under NEPA and the NHPA, military installations need to demonstrate that CRM decisions are based on objective...maxim “one size does not fit all,” and demonstrate that DoD installations have many different CRM needs that can and should be met through a variety

  13. Performance modeling, loss networks, and statistical multiplexing

    CERN Document Server

    Mazumdar, Ravi

    2009-01-01

    This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of understanding the phenomenon of statistical multiplexing. The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the important ideas of Palm distributions associated with traffic models and their role in performance measures. Also presented are recent ideas of large buffer, and many sources asymptotics that play an important role in understanding statistical multiplexing. I

  14. Advances in statistical models for data analysis

    CERN Document Server

    Minerva, Tommaso; Vichi, Maurizio

    2015-01-01

    This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.

  15. Structured statistical models of inductive reasoning.

    Science.gov (United States)

    Kemp, Charles; Tenenbaum, Joshua B

    2009-01-01

    Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.

  16. Model for neural signaling leap statistics

    International Nuclear Information System (INIS)

    Chevrollier, Martine; Oria, Marcos

    2011-01-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T 37.5 0 C, awaken regime) and Levy statistics (T = 35.5 0 C, sleeping period), characterized by rare events of long range connections.

  17. Statistical models based on conditional probability distributions

    International Nuclear Information System (INIS)

    Narayanan, R.S.

    1991-10-01

    We present a formulation of statistical mechanics models based on conditional probability distribution rather than a Hamiltonian. We show that it is possible to realize critical phenomena through this procedure. Closely linked with this formulation is a Monte Carlo algorithm, in which a configuration generated is guaranteed to be statistically independent from any other configuration for all values of the parameters, in particular near the critical point. (orig.)

  18. Model for neural signaling leap statistics

    Science.gov (United States)

    Chevrollier, Martine; Oriá, Marcos

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.

  19. Model for neural signaling leap statistics

    Energy Technology Data Exchange (ETDEWEB)

    Chevrollier, Martine; Oria, Marcos, E-mail: oria@otica.ufpb.br [Laboratorio de Fisica Atomica e Lasers Departamento de Fisica, Universidade Federal da ParaIba Caixa Postal 5086 58051-900 Joao Pessoa, Paraiba (Brazil)

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T 37.5{sup 0}C, awaken regime) and Levy statistics (T = 35.5{sup 0}C, sleeping period), characterized by rare events of long range connections.

  20. Statistical Modeling of Energy Production by Photovoltaic Farms

    Czech Academy of Sciences Publication Activity Database

    Brabec, Marek; Pelikán, Emil; Krč, Pavel; Eben, Kryštof; Musílek, P.

    2011-01-01

    Roč. 5, č. 9 (2011), s. 785-793 ISSN 1934-8975 Grant - others:GA AV ČR(CZ) M100300904 Institutional research plan: CEZ:AV0Z10300504 Keywords : electrical energy * solar energy * numerical weather prediction model * nonparametric regression * beta regression Subject RIV: BB - Applied Statistics, Operational Research

  1. Hierarchical modelling for the environmental sciences statistical methods and applications

    CERN Document Server

    Clark, James S

    2006-01-01

    New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.

  2. Candidate Prediction Models and Methods

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Madsen, Henrik

    2005-01-01

    This document lists candidate prediction models for Work Package 3 (WP3) of the PSO-project called ``Intelligent wind power prediction systems'' (FU4101). The main focus is on the models transforming numerical weather predictions into predictions of power production. The document also outlines...... the possibilities w.r.t. different numerical weather predictions actually available to the project....

  3. Can spatial statistical river temperature models be transferred between catchments?

    Science.gov (United States)

    Jackson, Faye L.; Fryer, Robert J.; Hannah, David M.; Malcolm, Iain A.

    2017-09-01

    There has been increasing use of spatial statistical models to understand and predict river temperature (Tw) from landscape covariates. However, it is not financially or logistically feasible to monitor all rivers and the transferability of such models has not been explored. This paper uses Tw data from four river catchments collected in August 2015 to assess how well spatial regression models predict the maximum 7-day rolling mean of daily maximum Tw (Twmax) within and between catchments. Models were fitted for each catchment separately using (1) landscape covariates only (LS models) and (2) landscape covariates and an air temperature (Ta) metric (LS_Ta models). All the LS models included upstream catchment area and three included a river network smoother (RNS) that accounted for unexplained spatial structure. The LS models transferred reasonably to other catchments, at least when predicting relative levels of Twmax. However, the predictions were biased when mean Twmax differed between catchments. The RNS was needed to characterise and predict finer-scale spatially correlated variation. Because the RNS was unique to each catchment and thus non-transferable, predictions were better within catchments than between catchments. A single model fitted to all catchments found no interactions between the landscape covariates and catchment, suggesting that the landscape relationships were transferable. The LS_Ta models transferred less well, with particularly poor performance when the relationship with the Ta metric was physically implausible or required extrapolation outside the range of the data. A single model fitted to all catchments found catchment-specific relationships between Twmax and the Ta metric, indicating that the Ta metric was not transferable. These findings improve our understanding of the transferability of spatial statistical river temperature models and provide a foundation for developing new approaches for predicting Tw at unmonitored locations across

  4. Probably not future prediction using probability and statistical inference

    CERN Document Server

    Dworsky, Lawrence N

    2008-01-01

    An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...

  5. Predictive Surface Complexation Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Sverjensky, Dimitri A. [Johns Hopkins Univ., Baltimore, MD (United States). Dept. of Earth and Planetary Sciences

    2016-11-29

    Surface complexation plays an important role in the equilibria and kinetics of processes controlling the compositions of soilwaters and groundwaters, the fate of contaminants in groundwaters, and the subsurface storage of CO2 and nuclear waste. Over the last several decades, many dozens of individual experimental studies have addressed aspects of surface complexation that have contributed to an increased understanding of its role in natural systems. However, there has been no previous attempt to develop a model of surface complexation that can be used to link all the experimental studies in order to place them on a predictive basis. Overall, my research has successfully integrated the results of the work of many experimentalists published over several decades. For the first time in studies of the geochemistry of the mineral-water interface, a practical predictive capability for modeling has become available. The predictive correlations developed in my research now enable extrapolations of experimental studies to provide estimates of surface chemistry for systems not yet studied experimentally and for natural and anthropogenically perturbed systems.

  6. Prediction of slant path rain attenuation statistics at various locations

    Science.gov (United States)

    Goldhirsh, J.

    1977-01-01

    The paper describes a method for predicting slant path attenuation statistics at arbitrary locations for variable frequencies and path elevation angles. The method involves the use of median reflectivity factor-height profiles measured with radar as well as the use of long-term point rain rate data and assumed or measured drop size distributions. The attenuation coefficient due to cloud liquid water in the presence of rain is also considered. Absolute probability fade distributions are compared for eight cases: Maryland (15 GHz), Texas (30 GHz), Slough, England (19 and 37 GHz), Fayetteville, North Carolina (13 and 18 GHz), and Cambridge, Massachusetts (13 and 18 GHz).

  7. Growth curve models and statistical diagnostics

    CERN Document Server

    Pan, Jian-Xin

    2002-01-01

    Growth-curve models are generalized multivariate analysis-of-variance models. These models are especially useful for investigating growth problems on short times in economics, biology, medical research, and epidemiology. This book systematically introduces the theory of the GCM with particular emphasis on their multivariate statistical diagnostics, which are based mainly on recent developments made by the authors and their collaborators. The authors provide complete proofs of theorems as well as practical data sets and MATLAB code.

  8. Topology for Statistical Modeling of Petascale Data

    Energy Technology Data Exchange (ETDEWEB)

    Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)

    2013-10-31

    Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.

  9. Bayesian models a statistical primer for ecologists

    CERN Document Server

    Hobbs, N Thompson

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probabili

  10. A statistical model for radar images of agricultural scenes

    Science.gov (United States)

    Frost, V. S.; Shanmugan, K. S.; Holtzman, J. C.; Stiles, J. A.

    1982-01-01

    The presently derived and validated statistical model for radar images containing many different homogeneous fields predicts the probability density functions of radar images of entire agricultural scenes, thereby allowing histograms of large scenes composed of a variety of crops to be described. Seasat-A SAR images of agricultural scenes are accurately predicted by the model on the basis of three assumptions: each field has the same SNR, all target classes cover approximately the same area, and the true reflectivity characterizing each individual target class is a uniformly distributed random variable. The model is expected to be useful in the design of data processing algorithms and for scene analysis using radar images.

  11. Direct Breakthrough Curve Prediction From Statistics of Heterogeneous Conductivity Fields

    Science.gov (United States)

    Hansen, Scott K.; Haslauer, Claus P.; Cirpka, Olaf A.; Vesselinov, Velimir V.

    2018-01-01

    This paper presents a methodology to predict the shape of solute breakthrough curves in heterogeneous aquifers at early times and/or under high degrees of heterogeneity, both cases in which the classical macrodispersion theory may not be applicable. The methodology relies on the observation that breakthrough curves in heterogeneous media are generally well described by lognormal distributions, and mean breakthrough times can be predicted analytically. The log-variance of solute arrival is thus sufficient to completely specify the breakthrough curves, and this is calibrated as a function of aquifer heterogeneity and dimensionless distance from a source plane by means of Monte Carlo analysis and statistical regression. Using the ensemble of simulated groundwater flow and solute transport realizations employed to calibrate the predictive regression, reliability estimates for the prediction are also developed. Additional theoretical contributions include heuristics for the time until an effective macrodispersion coefficient becomes applicable, and also an expression for its magnitude that applies in highly heterogeneous systems. It is seen that the results here represent a way to derive continuous time random walk transition distributions from physical considerations rather than from empirical field calibration.

  12. Statistical transmutation in doped quantum dimer models.

    Science.gov (United States)

    Lamas, C A; Ralko, A; Cabra, D C; Poilblanc, D; Pujol, P

    2012-07-06

    We prove a "statistical transmutation" symmetry of doped quantum dimer models on the square, triangular, and kagome lattices: the energy spectrum is invariant under a simultaneous change of statistics (i.e., bosonic into fermionic or vice versa) of the holes and of the signs of all the dimer resonance loops. This exact transformation enables us to define the duality equivalence between doped quantum dimer Hamiltonians and provides the analytic framework to analyze dynamical statistical transmutations. We investigate numerically the doping of the triangular quantum dimer model with special focus on the topological Z(2) dimer liquid. Doping leads to four (instead of two for the square lattice) inequivalent families of Hamiltonians. Competition between phase separation, superfluidity, supersolidity, and fermionic phases is investigated in the four families.

  13. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    Science.gov (United States)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  14. STATISTICAL MODELS OF REPRESENTING INTELLECTUAL CAPITAL

    Directory of Open Access Journals (Sweden)

    Andreea Feraru

    2016-06-01

    Full Text Available This article entitled Statistical Models of Representing Intellectual Capital approaches and analyses the concept of intellectual capital, as well as the main models which can support enterprisers/managers in evaluating and quantifying the advantages of intellectual capital. Most authors examine intellectual capital from a static perspective and focus on the development of its various evaluation models. In this chapter we surveyed the classical static models: Sveiby, Edvisson, Balanced Scorecard, as well as the canonical model of intellectual capital. Among the group of static models for evaluating organisational intellectual capital the canonical model stands out. This model enables the structuring of organisational intellectual capital in: human capital, structural capital and relational capital. Although the model is widely spread, it is a static one and can thus create a series of errors in the process of evaluation, because all the three entities mentioned above are not independent from the viewpoint of their contents, as any logic of structuring complex entities requires.

  15. Understanding and forecasting polar stratospheric variability with statistical models

    Directory of Open Access Journals (Sweden)

    C. Blume

    2012-07-01

    Full Text Available The variability of the north-polar stratospheric vortex is a prominent aspect of the middle atmosphere. This work investigates a wide class of statistical models with respect to their ability to model geopotential and temperature anomalies, representing variability in the polar stratosphere. Four partly nonstationary, nonlinear models are assessed: linear discriminant analysis (LDA; a cluster method based on finite elements (FEM-VARX; a neural network, namely the multi-layer perceptron (MLP; and support vector regression (SVR. These methods model time series by incorporating all significant external factors simultaneously, including ENSO, QBO, the solar cycle, volcanoes, to then quantify their statistical importance. We show that variability in reanalysis data from 1980 to 2005 is successfully modeled. The period from 2005 to 2011 can be hindcasted to a certain extent, where MLP performs significantly better than the remaining models. However, variability remains that cannot be statistically hindcasted within the current framework, such as the unexpected major warming in January 2009. Finally, the statistical model with the best generalization performance is used to predict a winter 2011/12 with warm and weak vortex conditions. A vortex breakdown is predicted for late January, early February 2012.

  16. (ajst) statistical mechanics model for orientational

    African Journals Online (AJOL)

    Science and Engineering Series Vol. 6, No. 2, pp. 94 - 101. STATISTICAL MECHANICS MODEL FOR ORIENTATIONAL. MOTION OF TWO-DIMENSIONAL RIGID ROTATOR. Malo, J.O. ... there is no translational motion and that they are well separated so .... constant and I is the moment of inertia of a linear rotator. Thus, the ...

  17. Statistical Model Checking for Biological Systems

    DEFF Research Database (Denmark)

    David, Alexandre; Larsen, Kim Guldstrand; Legay, Axel

    2014-01-01

    Statistical Model Checking (SMC) is a highly scalable simulation-based verification approach for testing and estimating the probability that a stochastic system satisfies a given linear temporal property. The technique has been applied to (discrete and continuous time) Markov chains, stochastic...

  18. Topology for Statistical Modeling of Petascale Data

    Energy Technology Data Exchange (ETDEWEB)

    Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)

    2014-07-01

    This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.

  19. Establishing statistical models of manufacturing parameters

    International Nuclear Information System (INIS)

    Senevat, J.; Pape, J.L.; Deshayes, J.F.

    1991-01-01

    This paper reports on the effect of pilgering and cold-work parameters on contractile strain ratio and mechanical properties that were investigated using a large population of Zircaloy tubes. Statistical models were established between: contractile strain ratio and tooling parameters, mechanical properties (tensile test, creep test) and cold-work parameters, and mechanical properties and stress-relieving temperature

  20. Statistical models for optimizing mineral exploration

    International Nuclear Information System (INIS)

    Wignall, T.K.; DeGeoffroy, J.

    1987-01-01

    The primary purpose of mineral exploration is to discover ore deposits. The emphasis of this volume is on the mathematical and computational aspects of optimizing mineral exploration. The seven chapters that make up the main body of the book are devoted to the description and application of various types of computerized geomathematical models. These chapters include: (1) the optimal selection of ore deposit types and regions of search, as well as prospecting selected areas, (2) designing airborne and ground field programs for the optimal coverage of prospecting areas, and (3) delineating and evaluating exploration targets within prospecting areas by means of statistical modeling. Many of these statistical programs are innovative and are designed to be useful for mineral exploration modeling. Examples of geomathematical models are applied to exploring for six main types of base and precious metal deposits, as well as other mineral resources (such as bauxite and uranium)

  1. A statistical model for mapping morphological shape

    Directory of Open Access Journals (Sweden)

    Li Jiahan

    2010-07-01

    Full Text Available Abstract Background Living things come in all shapes and sizes, from bacteria, plants, and animals to humans. Knowledge about the genetic mechanisms for biological shape has far-reaching implications for a range spectrum of scientific disciplines including anthropology, agriculture, developmental biology, evolution and biomedicine. Results We derived a statistical model for mapping specific genes or quantitative trait loci (QTLs that control morphological shape. The model was formulated within the mixture framework, in which different types of shape are thought to result from genotypic discrepancies at a QTL. The EM algorithm was implemented to estimate QTL genotype-specific shapes based on a shape correspondence analysis. Computer simulation was used to investigate the statistical property of the model. Conclusion By identifying specific QTLs for morphological shape, the model developed will help to ask, disseminate and address many major integrative biological and genetic questions and challenges in the genetic control of biological shape and function.

  2. Statistical Validation of Engineering and Scientific Models: Background

    International Nuclear Information System (INIS)

    Hills, Richard G.; Trucano, Timothy G.

    1999-01-01

    A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the uncertainty in predictions for linear and nonlinear models. Four example applications are presented; a linear model, a model for the behavior of a damped spring-mass system, a transient thermal conduction model, and a nonlinear transient convective-diffusive model based on Burger's equation. Correlated and uncorrelated model input parameters are considered. The model validation tutorial builds on the material presented in the propagation of uncertainty tutoriaI and uses the damp spring-mass system as the example application. The validation tutorial illustrates several concepts associated with the application of statistical inference to test model predictions against experimental observations. Several validation methods are presented including error band based, multivariate, sum of squares of residuals, and optimization methods. After completion of the tutorial, a survey of statistical model validation literature is presented and recommendations for future work are made

  3. Adaptive Maneuvering Frequency Method of Current Statistical Model

    Institute of Scientific and Technical Information of China (English)

    Wei Sun; Yongjian Yang

    2017-01-01

    Current statistical model(CSM) has a good performance in maneuvering target tracking. However, the fixed maneuvering frequency will deteriorate the tracking results, such as a serious dynamic delay, a slowly converging speedy and a limited precision when using Kalman filter(KF) algorithm. In this study, a new current statistical model and a new Kalman filter are proposed to improve the performance of maneuvering target tracking. The new model which employs innovation dominated subjection function to adaptively adjust maneuvering frequency has a better performance in step maneuvering target tracking, while a fluctuant phenomenon appears. As far as this problem is concerned, a new adaptive fading Kalman filter is proposed as well. In the new Kalman filter, the prediction values are amended in time by setting judgment and amendment rules,so that tracking precision and fluctuant phenomenon of the new current statistical model are improved. The results of simulation indicate the effectiveness of the new algorithm and the practical guiding significance.

  4. Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop

    Science.gov (United States)

    Morrison, Joseph H.

    2010-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.

  5. Performance modeling, stochastic networks, and statistical multiplexing

    CERN Document Server

    Mazumdar, Ravi R

    2013-01-01

    This monograph presents a concise mathematical approach for modeling and analyzing the performance of communication networks with the aim of introducing an appropriate mathematical framework for modeling and analysis as well as understanding the phenomenon of statistical multiplexing. The models, techniques, and results presented form the core of traffic engineering methods used to design, control and allocate resources in communication networks.The novelty of the monograph is the fresh approach and insights provided by a sample-path methodology for queueing models that highlights the importan

  6. Statistical models for competing risk analysis

    International Nuclear Information System (INIS)

    Sather, H.N.

    1976-08-01

    Research results on three new models for potential applications in competing risks problems. One section covers the basic statistical relationships underlying the subsequent competing risks model development. Another discusses the problem of comparing cause-specific risk structure by competing risks theory in two homogeneous populations, P1 and P2. Weibull models which allow more generality than the Berkson and Elveback models are studied for the effect of time on the hazard function. The use of concomitant information for modeling single-risk survival is extended to the multiple failure mode domain of competing risks. The model used to illustrate the use of this methodology is a life table model which has constant hazards within pre-designated intervals of the time scale. Two parametric models for bivariate dependent competing risks, which provide interesting alternatives, are proposed and examined

  7. Statistical physics of pairwise probability models

    Directory of Open Access Journals (Sweden)

    Yasser Roudi

    2009-11-01

    Full Text Available Statistical models for describing the probability distribution over the states of biological systems are commonly used for dimensional reduction. Among these models, pairwise models are very attractive in part because they can be fit using a reasonable amount of data: knowledge of the means and correlations between pairs of elements in the system is sufficient. Not surprisingly, then, using pairwise models for studying neural data has been the focus of many studies in recent years. In this paper, we describe how tools from statistical physics can be employed for studying and using pairwise models. We build on our previous work on the subject and study the relation between different methods for fitting these models and evaluating their quality. In particular, using data from simulated cortical networks we study how the quality of various approximate methods for inferring the parameters in a pairwise model depends on the time bin chosen for binning the data. We also study the effect of the size of the time bin on the model quality itself, again using simulated data. We show that using finer time bins increases the quality of the pairwise model. We offer new ways of deriving the expressions reported in our previous work for assessing the quality of pairwise models.

  8. A statistical mechanical model for equilibrium ionization

    International Nuclear Information System (INIS)

    Macris, N.; Martin, P.A.; Pule, J.

    1990-01-01

    A quantum electron interacts with a classical gas of hard spheres and is in thermal equilibrium with it. The interaction is attractive and the electron can form a bound state with the classical particles. It is rigorously shown that in a well defined low density and low temperature limit, the ionization probability for the electron tends to the value predicted by the Saha formula for thermal ionization. In this regime, the electron is found to be in a statistical mixture of a bound and a free state. (orig.)

  9. Application of statistical classification methods for predicting the acceptability of well-water quality

    Science.gov (United States)

    Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

    2018-01-01

    The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.

  10. Equilibrium statistical mechanics of lattice models

    CERN Document Server

    Lavis, David A

    2015-01-01

    Most interesting and difficult problems in equilibrium statistical mechanics concern models which exhibit phase transitions. For graduate students and more experienced researchers this book provides an invaluable reference source of approximate and exact solutions for a comprehensive range of such models. Part I contains background material on classical thermodynamics and statistical mechanics, together with a classification and survey of lattice models. The geometry of phase transitions is described and scaling theory is used to introduce critical exponents and scaling laws. An introduction is given to finite-size scaling, conformal invariance and Schramm—Loewner evolution. Part II contains accounts of classical mean-field methods. The parallels between Landau expansions and catastrophe theory are discussed and Ginzburg—Landau theory is introduced. The extension of mean-field theory to higher-orders is explored using the Kikuchi—Hijmans—De Boer hierarchy of approximations. In Part III the use of alge...

  11. Statistical Models of Adaptive Immune populations

    Science.gov (United States)

    Sethna, Zachary; Callan, Curtis; Walczak, Aleksandra; Mora, Thierry

    The availability of large (104-106 sequences) datasets of B or T cell populations from a single individual allows reliable fitting of complex statistical models for naïve generation, somatic selection, and hypermutation. It is crucial to utilize a probabilistic/informational approach when modeling these populations. The inferred probability distributions allow for population characterization, calculation of probability distributions of various hidden variables (e.g. number of insertions), as well as statistical properties of the distribution itself (e.g. entropy). In particular, the differences between the T cell populations of embryonic and mature mice will be examined as a case study. Comparing these populations, as well as proposed mixed populations, provides a concrete exercise in model creation, comparison, choice, and validation.

  12. Cellular automata and statistical mechanical models

    International Nuclear Information System (INIS)

    Rujan, P.

    1987-01-01

    The authors elaborate on the analogy between the transfer matrix of usual lattice models and the master equation describing the time development of cellular automata. Transient and stationary properties of probabilistic automata are linked to surface and bulk properties, respectively, of restricted statistical mechanical systems. It is demonstrated that methods of statistical physics can be successfully used to describe the dynamic and the stationary behavior of such automata. Some exact results are derived, including duality transformations, exact mappings, disorder, and linear solutions. Many examples are worked out in detail to demonstrate how to use statistical physics in order to construct cellular automata with desired properties. This approach is considered to be a first step toward the design of fully parallel, probabilistic systems whose computational abilities rely on the cooperative behavior of their components

  13. Multimesonic decays of charmonium states in the statistical quark model

    International Nuclear Information System (INIS)

    Montvay, I.; Toth, J.D.

    1978-01-01

    The data known at present of multimesonic decays of chi and psi states are fitted in a statistical quark model, in which the matrix elements are assumed to be constant and resonances as well as both strong and second order electromagnetic processes are taken into account. The experimental data are well reproduced by the model. Unknown branching ratios for the rest of multimesonic channels are predicted. The fit leaves about 40% for baryonic and radiative channels in the case of J/psi(3095). The fitted parameters of the J/psi decays are used to predict the mesonic decays of the pseudoscalar eta c. The statistical quark model seems to allow the calculation of competitive multiparticle processes for the studied decays. (D.P.)

  14. A new statistical scission-point model fed with microscopic ingredients to predict fission fragments distributions; Developpement d'un nouveau modele de point de scission base sur des ingredients microscopiques

    Energy Technology Data Exchange (ETDEWEB)

    Heinrich, S

    2006-07-01

    Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)

  15. A statistical model for instable thermodynamical systems

    International Nuclear Information System (INIS)

    Sommer, Jens-Uwe

    2003-01-01

    A generic model is presented for statistical systems which display thermodynamic features in contrast to our everyday experience, such as infinite and negative heat capacities. Such system are instable in terms of classical equilibrium thermodynamics. Using our statistical model, we are able to investigate states of instable systems which are undefined in the framework of equilibrium thermodynamics. We show that a region of negative heat capacity in the adiabatic environment, leads to a first order like phase transition when the system is coupled to a heat reservoir. This phase transition takes place without a phase coexistence. Nevertheless, all intermediate states are stable due to fluctuations. When two instable system are brought in thermal contact, the temperature of the composed system is lower than the minimum temperature of the individual systems. Generally, the equilibrium states of instable system cannot be simply decomposed into equilibrium states of the individual systems. The properties of instable system depend on the environment, ensemble equivalence is broken

  16. Predicting Smoking Status Using Machine Learning Algorithms and Statistical Analysis

    Directory of Open Access Journals (Sweden)

    Charles Frank

    2018-03-01

    Full Text Available Smoking has been proven to negatively affect health in a multitude of ways. As of 2009, smoking has been considered the leading cause of preventable morbidity and mortality in the United States, continuing to plague the country’s overall health. This study aims to investigate the viability and effectiveness of some machine learning algorithms for predicting the smoking status of patients based on their blood tests and vital readings results. The analysis of this study is divided into two parts: In part 1, we use One-way ANOVA analysis with SAS tool to show the statistically significant difference in blood test readings between smokers and non-smokers. The results show that the difference in INR, which measures the effectiveness of anticoagulants, was significant in favor of non-smokers which further confirms the health risks associated with smoking. In part 2, we use five machine learning algorithms: Naïve Bayes, MLP, Logistic regression classifier, J48 and Decision Table to predict the smoking status of patients. To compare the effectiveness of these algorithms we use: Precision, Recall, F-measure and Accuracy measures. The results show that the Logistic algorithm outperformed the four other algorithms with Precision, Recall, F-Measure, and Accuracy of 83%, 83.4%, 83.2%, 83.44%, respectively.

  17. Statistics and predictions of population, energy and environment problems

    International Nuclear Information System (INIS)

    Sobajima, Makoto

    1999-03-01

    In the situation that world's population, especially in developing countries, is rapidly growing, humankind is facing to global problems that they cannot steadily live unless they find individual places to live, obtain foods, and peacefully get energy necessary for living for centuries. For this purpose, humankind has to think what behavior they should take in the finite environment, talk, agree and execute. Though energy has been long respected as a symbol for improving living, demanded and used, they have come to limit the use making the global environment more serious. If there is sufficient energy not loading cost to the environment. If nuclear energy regarded as such one sustain the resource for long and has market competitiveness. What situation of realization of compensating new energy is now in the case the use of nuclear energy is restricted by the society fearing radioactivity. If there are promising ones for the future. One concerning with the study of energy cannot go without knowing these. The statistical materials compiled here are thought to be useful for that purpose, and are collected mainly from ones viewing future prediction based on past practices. Studies on the prediction is so important to have future measures that these data bases are expected to be improved for better accuracy. (author)

  18. Logarithmic transformed statistical models in calibration

    International Nuclear Information System (INIS)

    Zeis, C.D.

    1975-01-01

    A general type of statistical model used for calibration of instruments having the property that the standard deviations of the observed values increase as a function of the mean value is described. The application to the Helix Counter at the Rocky Flats Plant is primarily from a theoretical point of view. The Helix Counter measures the amount of plutonium in certain types of chemicals. The method described can be used also for other calibrations. (U.S.)

  19. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    International Nuclear Information System (INIS)

    Nedic, Vladimir; Despotovic, Danijela; Cvetanovic, Slobodan; Despotovic, Milan; Babic, Sasa

    2014-01-01

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L eq . Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model

  20. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    Energy Technology Data Exchange (ETDEWEB)

    Nedic, Vladimir, E-mail: vnedic@kg.ac.rs [Faculty of Philology and Arts, University of Kragujevac, Jovana Cvijića bb, 34000 Kragujevac (Serbia); Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs [Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac (Serbia); Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs [Faculty of Economics, University of Niš, Trg kralja Aleksandra Ujedinitelja, 18000 Niš (Serbia); Despotovic, Milan, E-mail: mdespotovic@kg.ac.rs [Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac (Serbia); Babic, Sasa, E-mail: babicsf@yahoo.com [College of Applied Mechanical Engineering, Trstenik (Serbia)

    2014-11-15

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.

  1. Applied systems ecology: models, data, and statistical methods

    Energy Technology Data Exchange (ETDEWEB)

    Eberhardt, L L

    1976-01-01

    In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.

  2. A simple statistical model for geomagnetic reversals

    Science.gov (United States)

    Constable, Catherine

    1990-01-01

    The diversity of paleomagnetic records of geomagnetic reversals now available indicate that the field configuration during transitions cannot be adequately described by simple zonal or standing field models. A new model described here is based on statistical properties inferred from the present field and is capable of simulating field transitions like those observed. Some insight is obtained into what one can hope to learn from paleomagnetic records. In particular, it is crucial that the effects of smoothing in the remanence acquisition process be separated from true geomagnetic field behavior. This might enable us to determine the time constants associated with the dominant field configuration during a reversal.

  3. Introduction to statistical modelling: linear regression.

    Science.gov (United States)

    Lunt, Mark

    2015-07-01

    In many studies we wish to assess how a range of variables are associated with a particular outcome and also determine the strength of such relationships so that we can begin to understand how these factors relate to each other at a population level. Ultimately, we may also be interested in predicting the outcome from a series of predictive factors available at, say, a routine clinic visit. In a recent article in Rheumatology, Desai et al. did precisely that when they studied the prediction of hip and spine BMD from hand BMD and various demographic, lifestyle, disease and therapy variables in patients with RA. This article aims to introduce the statistical methodology that can be used in such a situation and explain the meaning of some of the terms employed. It will also outline some common pitfalls encountered when performing such analyses. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. Exploiting linkage disequilibrium in statistical modelling in quantitative genomics

    DEFF Research Database (Denmark)

    Wang, Lei

    Alleles at two loci are said to be in linkage disequilibrium (LD) when they are correlated or statistically dependent. Genomic prediction and gene mapping rely on the existence of LD between gentic markers and causul variants of complex traits. In the first part of the thesis, a novel method...... to quantify and visualize local variation in LD along chromosomes in describet, and applied to characterize LD patters at the local and genome-wide scale in three Danish pig breeds. In the second part, different ways of taking LD into account in genomic prediction models are studied. One approach is to use...... the recently proposed antedependence models, which treat neighbouring marker effects as correlated; another approach involves use of haplotype block information derived using the program Beagle. The overall conclusion is that taking LD information into account in genomic prediction models potentially improves...

  5. Comparison and validation of statistical methods for predicting power outage durations in the event of hurricanes.

    Science.gov (United States)

    Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M

    2011-12-01

    This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.

  6. Statistical Modelling of the Soil Dielectric Constant

    Science.gov (United States)

    Usowicz, Boguslaw; Marczewski, Wojciech; Bogdan Usowicz, Jerzy; Lipiec, Jerzy

    2010-05-01

    The dielectric constant of soil is the physical property being very sensitive on water content. It funds several electrical measurement techniques for determining the water content by means of direct (TDR, FDR, and others related to effects of electrical conductance and/or capacitance) and indirect RS (Remote Sensing) methods. The work is devoted to a particular statistical manner of modelling the dielectric constant as the property accounting a wide range of specific soil composition, porosity, and mass density, within the unsaturated water content. Usually, similar models are determined for few particular soil types, and changing the soil type one needs switching the model on another type or to adjust it by parametrization of soil compounds. Therefore, it is difficult comparing and referring results between models. The presented model was developed for a generic representation of soil being a hypothetical mixture of spheres, each representing a soil fraction, in its proper phase state. The model generates a serial-parallel mesh of conductive and capacitive paths, which is analysed for a total conductive or capacitive property. The model was firstly developed to determine the thermal conductivity property, and now it is extended on the dielectric constant by analysing the capacitive mesh. The analysis is provided by statistical means obeying physical laws related to the serial-parallel branching of the representative electrical mesh. Physical relevance of the analysis is established electrically, but the definition of the electrical mesh is controlled statistically by parametrization of compound fractions, by determining the number of representative spheres per unitary volume per fraction, and by determining the number of fractions. That way the model is capable covering properties of nearly all possible soil types, all phase states within recognition of the Lorenz and Knudsen conditions. In effect the model allows on generating a hypothetical representative of

  7. Encoding Dissimilarity Data for Statistical Model Building.

    Science.gov (United States)

    Wahba, Grace

    2010-12-01

    We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.

  8. Comparison of statistical and clinical predictions of functional outcome after ischemic stroke.

    Directory of Open Access Journals (Sweden)

    Douglas D Thompson

    Full Text Available To determine whether the predictions of functional outcome after ischemic stroke made at the bedside using a doctor's clinical experience were more or less accurate than the predictions made by clinical prediction models (CPMs.A prospective cohort study of nine hundred and thirty one ischemic stroke patients recruited consecutively at the outpatient, inpatient and emergency departments of the Western General Hospital, Edinburgh between 2002 and 2005. Doctors made informal predictions of six month functional outcome on the Oxford Handicap Scale (OHS. Patients were followed up at six months with a validated postal questionnaire. For each patient we calculated the absolute predicted risk of death or dependence (OHS≥3 using five previously described CPMs. The specificity of a doctor's informal predictions of OHS≥3 at six months was good 0.96 (95% CI: 0.94 to 0.97 and similar to CPMs (range 0.94 to 0.96; however the sensitivity of both informal clinical predictions 0.44 (95% CI: 0.39 to 0.49 and clinical prediction models (range 0.38 to 0.45 was poor. The prediction of the level of disability after stroke was similar for informal clinical predictions (ordinal c-statistic 0.74 with 95% CI 0.72 to 0.76 and CPMs (range 0.69 to 0.75. No patient or clinician characteristic affected the accuracy of informal predictions, though predictions were more accurate in outpatients.CPMs are at least as good as informal clinical predictions in discriminating between good and bad functional outcome after ischemic stroke. The place of these models in clinical practice has yet to be determined.

  9. Comparison of four statistical and machine learning methods for crash severity prediction.

    Science.gov (United States)

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Massive Predictive Modeling using Oracle R Enterprise

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    R is fast becoming the lingua franca for analyzing data via statistics, visualization, and predictive analytics. For enterprise-scale data, R users have three main concerns: scalability, performance, and production deployment. Oracle's R-based technologies - Oracle R Distribution, Oracle R Enterprise, Oracle R Connector for Hadoop, and the R package ROracle - address these concerns. In this talk, we introduce Oracle's R technologies, highlighting how each enables R users to achieve scalability and performance while making production deployment of R results a natural outcome of the data analyst/scientist efforts. The focus then turns to Oracle R Enterprise with code examples using the transparency layer and embedded R execution, targeting massive predictive modeling. One goal behind massive predictive modeling is to build models per entity, such as customers, zip codes, simulations, in an effort to understand behavior and tailor predictions at the entity level. Predictions...

  11. Optimizing refiner operation with statistical modelling

    Energy Technology Data Exchange (ETDEWEB)

    Broderick, G [Noranda Research Centre, Pointe Claire, PQ (Canada)

    1997-02-01

    The impact of refining conditions on the energy efficiency of the process and on the handsheet quality of a chemi-mechanical pulp was studied as part of a series of pilot scale refining trials. Statistical models of refiner performance were constructed from these results and non-linear optimization of process conditions were conducted. Optimization results indicated that increasing the ratio of specific energy applied in the first stage led to a reduction of some 15 per cent in the total energy requirement. The strategy can also be used to obtain significant increases in pulp quality for a given energy input. 20 refs., 6 tabs.

  12. Average Nuclear properties based on statistical model

    International Nuclear Information System (INIS)

    El-Jaick, L.J.

    1974-01-01

    The rough properties of nuclei were investigated by statistical model, in systems with the same and different number of protons and neutrons, separately, considering the Coulomb energy in the last system. Some average nuclear properties were calculated based on the energy density of nuclear matter, from Weizsscker-Beth mass semiempiric formulae, generalized for compressible nuclei. In the study of a s surface energy coefficient, the great influence exercised by Coulomb energy and nuclear compressibility was verified. For a good adjust of beta stability lines and mass excess, the surface symmetry energy were established. (M.C.K.) [pt

  13. Incorporating uncertainty in predictive species distribution modelling.

    Science.gov (United States)

    Beale, Colin M; Lennon, Jack J

    2012-01-19

    Motivated by the need to solve ecological problems (climate change, habitat fragmentation and biological invasions), there has been increasing interest in species distribution models (SDMs). Predictions from these models inform conservation policy, invasive species management and disease-control measures. However, predictions are subject to uncertainty, the degree and source of which is often unrecognized. Here, we review the SDM literature in the context of uncertainty, focusing on three main classes of SDM: niche-based models, demographic models and process-based models. We identify sources of uncertainty for each class and discuss how uncertainty can be minimized or included in the modelling process to give realistic measures of confidence around predictions. Because this has typically not been performed, we conclude that uncertainty in SDMs has often been underestimated and a false precision assigned to predictions of geographical distribution. We identify areas where development of new statistical tools will improve predictions from distribution models, notably the development of hierarchical models that link different types of distribution model and their attendant uncertainties across spatial scales. Finally, we discuss the need to develop more defensible methods for assessing predictive performance, quantifying model goodness-of-fit and for assessing the significance of model covariates.

  14. Statistical approach to predict compressive strength of high workability slag-cement mortars

    International Nuclear Information System (INIS)

    Memon, N.A.; Memon, N.A.; Sumadi, S.R.

    2009-01-01

    This paper reports an attempt made to develop empirical expressions to estimate/ predict the compressive strength of high workability slag-cement mortars. Experimental data of 54 mix mortars were used. The mortars were prepared with slag as cement replacement of the order of 0, 50 and 60%. The flow (workability) was maintained at 136+-3%. The numerical and statistical analysis was performed by using database computer software Microsoft Office Excel 2003. Three empirical mathematical models were developed to estimate/predict 28 days compressive strength of high workability slag cement-mortars with 0, 50 and 60% slag which predict the values accurate between 97 and 98%. Finally a generalized empirical mathematical model was proposed which can predict 28 days compressive strength of high workability mortars up to degree of accuracy 95%. (author)

  15. Predicting energy performance of a net-zero energy building: A statistical approach

    International Nuclear Information System (INIS)

    Kneifel, Joshua; Webb, David

    2016-01-01

    Highlights: • A regression model is applied to actual energy data from a net-zero energy building. • The model is validated through a rigorous statistical analysis. • Comparisons are made between model predictions and those of a physics-based model. • The model is a viable baseline for evaluating future models from the energy data. - Abstract: Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid Climate Zone, and compares these

  16. Spatial Economics Model Predicting Transport Volume

    Directory of Open Access Journals (Sweden)

    Lu Bo

    2016-10-01

    Full Text Available It is extremely important to predict the logistics requirements in a scientific and rational way. However, in recent years, the improvement effect on the prediction method is not very significant and the traditional statistical prediction method has the defects of low precision and poor interpretation of the prediction model, which cannot only guarantee the generalization ability of the prediction model theoretically, but also cannot explain the models effectively. Therefore, in combination with the theories of the spatial economics, industrial economics, and neo-classical economics, taking city of Zhuanghe as the research object, the study identifies the leading industry that can produce a large number of cargoes, and further predicts the static logistics generation of the Zhuanghe and hinterlands. By integrating various factors that can affect the regional logistics requirements, this study established a logistics requirements potential model from the aspect of spatial economic principles, and expanded the way of logistics requirements prediction from the single statistical principles to an new area of special and regional economics.

  17. Statistical pairwise interaction model of stock market

    Science.gov (United States)

    Bury, Thomas

    2013-03-01

    Financial markets are a classical example of complex systems as they are compound by many interacting stocks. As such, we can obtain a surprisingly good description of their structure by making the rough simplification of binary daily returns. Spin glass models have been applied and gave some valuable results but at the price of restrictive assumptions on the market dynamics or they are agent-based models with rules designed in order to recover some empirical behaviors. Here we show that the pairwise model is actually a statistically consistent model with the observed first and second moments of the stocks orientation without making such restrictive assumptions. This is done with an approach only based on empirical data of price returns. Our data analysis of six major indices suggests that the actual interaction structure may be thought as an Ising model on a complex network with interaction strengths scaling as the inverse of the system size. This has potentially important implications since many properties of such a model are already known and some techniques of the spin glass theory can be straightforwardly applied. Typical behaviors, as multiple equilibria or metastable states, different characteristic time scales, spatial patterns, order-disorder, could find an explanation in this picture.

  18. Statistical modelling of fine red wine production

    Directory of Open Access Journals (Sweden)

    María Rosa Castro

    2010-01-01

    Full Text Available Producing wine is a very important economic activity in the province of San Juan in Argentina; it is therefore most important to predict production regarding the quantity of raw material needed. This work was aimed at obtaining a model relating kilograms of crushed grape to the litres of wine so produced. Such model will be used for predicting precise future values and confidence intervals for determined quantities of crushed grapes. Data from a vineyard in the province of San Juan was thus used in this work. The sampling coefficient of correlation was calculated and a dispersion diagram was then constructed; this indicated a li- neal relationship between the litres of wine obtained and the kilograms of crushed grape. Two lineal models were then adopted and variance analysis was carried out because the data came from normal populations having the same variance. The most appropriate model was obtained from this analysis; it was validated with experimental values, a good approach being obtained.

  19. PREDICTIVE CAPACITY OF ARCH FAMILY MODELS

    Directory of Open Access Journals (Sweden)

    Raphael Silveira Amaro

    2016-03-01

    Full Text Available In the last decades, a remarkable number of models, variants from the Autoregressive Conditional Heteroscedastic family, have been developed and empirically tested, making extremely complex the process of choosing a particular model. This research aim to compare the predictive capacity, using the Model Confidence Set procedure, than five conditional heteroskedasticity models, considering eight different statistical probability distributions. The financial series which were used refers to the log-return series of the Bovespa index and the Dow Jones Industrial Index in the period between 27 October 2008 and 30 December 2014. The empirical evidences showed that, in general, competing models have a great homogeneity to make predictions, either for a stock market of a developed country or for a stock market of a developing country. An equivalent result can be inferred for the statistical probability distributions that were used.

  20. PREDICTED PERCENTAGE DISSATISFIED (PPD) MODEL ...

    African Journals Online (AJOL)

    HOD

    their low power requirements, are relatively cheap and are environment friendly. ... PREDICTED PERCENTAGE DISSATISFIED MODEL EVALUATION OF EVAPORATIVE COOLING ... The performance of direct evaporative coolers is a.

  1. Acceleration transforms and statistical kinetic models

    International Nuclear Information System (INIS)

    LuValle, M.J.; Welsher, T.L.; Svoboda, K.

    1988-01-01

    For a restricted class of problems a mathematical model of microscopic degradation processes, statistical kinetics, is developed and linked through acceleration transforms to the information which can be obtained from a system in which the only observable sign of degradation is sudden and catastrophic failure. The acceleration transforms were developed in accelerated life testing applications as a tool for extrapolating from the observable results of an accelerated life test to the dynamics of the underlying degradation processes. A particular concern of a physicist attempting to interpreted the results of an analysis based on acceleration transforms is determining the physical species involved in the degradation process. These species may be (a) relatively abundant or (b) relatively rare. The main results of this paper are a theorem showing that for an important subclass of statistical kinetic models, acceleration transforms cannot be used to distinguish between cases a and b, and an example showing that in some cases falling outside the restrictions of the theorem, cases a and b can be distinguished by their acceleration transforms

  2. Hybrid perturbation methods based on statistical time series models

    Science.gov (United States)

    San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario

    2016-04-01

    In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.

  3. Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop

    Science.gov (United States)

    Morrison, Joseph H.

    2013-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.

  4. Predicting Protein Secondary Structure with Markov Models

    DEFF Research Database (Denmark)

    Fischer, Paul; Larsen, Simon; Thomsen, Claus

    2004-01-01

    we are considering here, is to predict the secondary structure from the primary one. To this end we train a Markov model on training data and then use it to classify parts of unknown protein sequences as sheets, helices or coils. We show how to exploit the directional information contained...... in the Markov model for this task. Classifications that are purely based on statistical models might not always be biologically meaningful. We present combinatorial methods to incorporate biological background knowledge to enhance the prediction performance....

  5. A statistical mechanical model of economics

    Science.gov (United States)

    Lubbers, Nicholas Edward Williams

    Statistical mechanics pursues low-dimensional descriptions of systems with a very large number of degrees of freedom. I explore this theme in two contexts. The main body of this dissertation explores and extends the Yard Sale Model (YSM) of economic transactions using a combination of simulations and theory. The YSM is a simple interacting model for wealth distributions which has the potential to explain the empirical observation of Pareto distributions of wealth. I develop the link between wealth condensation and the breakdown of ergodicity due to nonlinear diffusion effects which are analogous to the geometric random walk. Using this, I develop a deterministic effective theory of wealth transfer in the YSM that is useful for explaining many quantitative results. I introduce various forms of growth to the model, paying attention to the effect of growth on wealth condensation, inequality, and ergodicity. Arithmetic growth is found to partially break condensation, and geometric growth is found to completely break condensation. Further generalizations of geometric growth with growth in- equality show that the system is divided into two phases by a tipping point in the inequality parameter. The tipping point marks the line between systems which are ergodic and systems which exhibit wealth condensation. I explore generalizations of the YSM transaction scheme to arbitrary betting functions to develop notions of universality in YSM-like models. I find that wealth vi condensation is universal to a large class of models which can be divided into two phases. The first exhibits slow, power-law condensation dynamics, and the second exhibits fast, finite-time condensation dynamics. I find that the YSM, which exhibits exponential dynamics, is the critical, self-similar model which marks the dividing line between the two phases. The final chapter develops a low-dimensional approach to materials microstructure quantification. Modern materials design harnesses complex

  6. Materials Informatics: Statistical Modeling in Material Science.

    Science.gov (United States)

    Yosipof, Abraham; Shimanovich, Klimentiy; Senderowitz, Hanoch

    2016-12-01

    Material informatics is engaged with the application of informatic principles to materials science in order to assist in the discovery and development of new materials. Central to the field is the application of data mining techniques and in particular machine learning approaches, often referred to as Quantitative Structure Activity Relationship (QSAR) modeling, to derive predictive models for a variety of materials-related "activities". Such models can accelerate the development of new materials with favorable properties and provide insight into the factors governing these properties. Here we provide a comparison between medicinal chemistry/drug design and materials-related QSAR modeling and highlight the importance of developing new, materials-specific descriptors. We survey some of the most recent QSAR models developed in materials science with focus on energetic materials and on solar cells. Finally we present new examples of material-informatic analyses of solar cells libraries produced from metal oxides using combinatorial material synthesis. Different analyses lead to interesting physical insights as well as to the design of new cells with potentially improved photovoltaic parameters. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Current algebra, statistical mechanics and quantum models

    Science.gov (United States)

    Vilela Mendes, R.

    2017-11-01

    Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.

  8. Statistical model for OCT image denoising

    KAUST Repository

    Li, Muxingzi

    2017-08-01

    Optical coherence tomography (OCT) is a non-invasive technique with a large array of applications in clinical imaging and biological tissue visualization. However, the presence of speckle noise affects the analysis of OCT images and their diagnostic utility. In this article, we introduce a new OCT denoising algorithm. The proposed method is founded on a numerical optimization framework based on maximum-a-posteriori estimate of the noise-free OCT image. It combines a novel speckle noise model, derived from local statistics of empirical spectral domain OCT (SD-OCT) data, with a Huber variant of total variation regularization for edge preservation. The proposed approach exhibits satisfying results in terms of speckle noise reduction as well as edge preservation, at reduced computational cost.

  9. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    Science.gov (United States)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  10. New advances in statistical modeling and applications

    CERN Document Server

    Santos, Rui; Oliveira, Maria; Paulino, Carlos

    2014-01-01

    This volume presents selected papers from the XIXth Congress of the Portuguese Statistical Society, held in the town of Nazaré, Portugal, from September 28 to October 1, 2011. All contributions were selected after a thorough peer-review process. It covers a broad range of papers in the areas of statistical science, probability and stochastic processes, extremes and statistical applications.

  11. Statistical Model Checking of Rich Models and Properties

    DEFF Research Database (Denmark)

    Poulsen, Danny Bøgsted

    in undecidability issues for the traditional model checking approaches. Statistical model checking has proven itself a valuable supplement to model checking and this thesis is concerned with extending this software validation technique to stochastic hybrid systems. The thesis consists of two parts: the first part...... motivates why existing model checking technology should be supplemented by new techniques. It also contains a brief introduction to probability theory and concepts covered by the six papers making up the second part. The first two papers are concerned with developing online monitoring techniques...... systems. The fifth paper shows how stochastic hybrid automata are useful for modelling biological systems and the final paper is concerned with showing how statistical model checking is efficiently distributed. In parallel with developing the theory contained in the papers, a substantial part of this work...

  12. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    Directory of Open Access Journals (Sweden)

    Abut F

    2015-08-01

    Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance

  13. Energy based prediction models for building acoustics

    DEFF Research Database (Denmark)

    Brunskog, Jonas

    2012-01-01

    In order to reach robust and simplified yet accurate prediction models, energy based principle are commonly used in many fields of acoustics, especially in building acoustics. This includes simple energy flow models, the framework of statistical energy analysis (SEA) as well as more elaborated...... principles as, e.g., wave intensity analysis (WIA). The European standards for building acoustic predictions, the EN 12354 series, are based on energy flow and SEA principles. In the present paper, different energy based prediction models are discussed and critically reviewed. Special attention is placed...... on underlying basic assumptions, such as diffuse fields, high modal overlap, resonant field being dominant, etc., and the consequences of these in terms of limitations in the theory and in the practical use of the models....

  14. Clinical Prediction Models for Cardiovascular Disease: Tufts Predictive Analytics and Comparative Effectiveness Clinical Prediction Model Database.

    Science.gov (United States)

    Wessler, Benjamin S; Lai Yh, Lana; Kramer, Whitney; Cangelosi, Michael; Raman, Gowri; Lutz, Jennifer S; Kent, David M

    2015-07-01

    Clinical prediction models (CPMs) estimate the probability of clinical outcomes and hold the potential to improve decision making and individualize care. For patients with cardiovascular disease, there are numerous CPMs available although the extent of this literature is not well described. We conducted a systematic review for articles containing CPMs for cardiovascular disease published between January 1990 and May 2012. Cardiovascular disease includes coronary heart disease, heart failure, arrhythmias, stroke, venous thromboembolism, and peripheral vascular disease. We created a novel database and characterized CPMs based on the stage of development, population under study, performance, covariates, and predicted outcomes. There are 796 models included in this database. The number of CPMs published each year is increasing steadily over time. Seven hundred seventeen (90%) are de novo CPMs, 21 (3%) are CPM recalibrations, and 58 (7%) are CPM adaptations. This database contains CPMs for 31 index conditions, including 215 CPMs for patients with coronary artery disease, 168 CPMs for population samples, and 79 models for patients with heart failure. There are 77 distinct index/outcome pairings. Of the de novo models in this database, 450 (63%) report a c-statistic and 259 (36%) report some information on calibration. There is an abundance of CPMs available for a wide assortment of cardiovascular disease conditions, with substantial redundancy in the literature. The comparative performance of these models, the consistency of effects and risk estimates across models and the actual and potential clinical impact of this body of literature is poorly understood. © 2015 American Heart Association, Inc.

  15. MODEL PREDICTIVE CONTROL FUNDAMENTALS

    African Journals Online (AJOL)

    2012-07-02

    Jul 2, 2012 ... signal based on a process model, coping with constraints on inputs and ... paper, we will present an introduction to the theory and application of MPC with Matlab codes ... section 5 presents the simulation results and section 6.

  16. Spatio-temporal statistical models with applications to atmospheric processes

    International Nuclear Information System (INIS)

    Wikle, C.K.

    1996-01-01

    This doctoral dissertation is presented as three self-contained papers. An introductory chapter considers traditional spatio-temporal statistical methods used in the atmospheric sciences from a statistical perspective. Although this section is primarily a review, many of the statistical issues considered have not been considered in the context of these methods and several open questions are posed. The first paper attempts to determine a means of characterizing the semiannual oscillation (SAO) spatial variation in the northern hemisphere extratropical height field. It was discovered that the midlatitude SAO in 500hPa geopotential height could be explained almost entirely as a result of spatial and temporal asymmetries in the annual variation of stationary eddies. It was concluded that the mechanism for the SAO in the northern hemisphere is a result of land-sea contrasts. The second paper examines the seasonal variability of mixed Rossby-gravity waves (MRGW) in lower stratospheric over the equatorial Pacific. Advanced cyclostationary time series techniques were used for analysis. It was found that there are significant twice-yearly peaks in MRGW activity. Analyses also suggested a convergence of horizontal momentum flux associated with these waves. In the third paper, a new spatio-temporal statistical model is proposed that attempts to consider the influence of both temporal and spatial variability. This method is mainly concerned with prediction in space and time, and provides a spatially descriptive and temporally dynamic model

  17. Network Data: Statistical Theory and New Models

    Science.gov (United States)

    2016-02-17

    and with environmental scientists at JPL and Emory University to retrieval from NASA MISR remote sensing images aerosol index AOD for air pollution ...Beijing, May, 2013 Beijing Statistics Forum, Beijing, May, 2013 Statistics Seminar, CREST-ENSAE, Paris , March, 2013 Statistics Seminar, University...to retrieval from NASA MISR remote sensing images aerosol index AOD for air pollution monitoring and management. Satellite- retrieved Aerosol Optical

  18. Statistical Models for Inferring Vegetation Composition from Fossil Pollen

    Science.gov (United States)

    Paciorek, C.; McLachlan, J. S.; Shang, Z.

    2011-12-01

    Fossil pollen provide information about vegetation composition that can be used to help understand how vegetation has changed over the past. However, these data have not traditionally been analyzed in a way that allows for statistical inference about spatio-temporal patterns and trends. We build a Bayesian hierarchical model called STEPPS (Spatio-Temporal Empirical Prediction from Pollen in Sediments) that predicts forest composition in southern New England, USA, over the last two millenia based on fossil pollen. The critical relationships between abundances of tree taxa in the pollen record and abundances in actual vegetation are estimated using modern (Forest Inventory Analysis) data and (witness tree) data from colonial records. This gives us two time points at which both pollen and direct vegetation data are available. Based on these relationships, and incorporating our uncertainty about them, we predict forest composition using fossil pollen. We estimate the spatial distribution and relative abundances of tree species and draw inference about how these patterns have changed over time. Finally, we describe ongoing work to extend the modeling to the upper Midwest of the U.S., including an approach to infer tree density and thereby estimate the prairie-forest boundary in Minnesota and Wisconsin. This work is part of the PalEON project, which brings together a team of ecosystem modelers, paleoecologists, and statisticians with the goal of reconstructing vegetation responses to climate during the last two millenia in the northeastern and midwestern United States. The estimates from the statistical modeling will be used to assess and calibrate ecosystem models that are used to project ecological changes in response to global change.

  19. Quantum statistical model for hot dense matter

    International Nuclear Information System (INIS)

    Rukhsana Kouser; Tasneem, G.; Saleem Shahzad, M.; Shafiq-ur-Rehman; Nasim, M.H.; Amjad Ali

    2015-01-01

    In solving numerous applied problems, one needs to know the equation of state, photon absorption coefficient and opacity of substances employed. We present a code for absorption coefficient and opacity calculation based on quantum statistical model. A self-consistent method for the calculation of potential is used. By solving Schrödinger equation with self-consistent potential we find energy spectrum of quantum mechanical system and corresponding wave functions. In addition we find mean occupation numbers of electron states and average charge state of the substance studied. The main processes of interaction of radiation with matter included in our opacity calculation are photon absorption in spectral lines (Bound-bound), photoionization (Bound-free), inverse bremsstrahlung (Free-free), Compton and Thomson scattering. Bound-bound line shape function has contribution from natural, Doppler, fine structure, collisional and stark broadening. To illustrate the main features of the code and its capabilities, calculation of average charge state, absorption coefficient, Rosseland and Planck mean and group opacities of aluminum and iron are presented. Results are satisfactorily compared with the published data. (authors)

  20. Predictive models of moth development

    Science.gov (United States)

    Degree-day models link ambient temperature to insect life-stages, making such models valuable tools in integrated pest management. These models increase management efficacy by predicting pest phenology. In Wisconsin, the top insect pest of cranberry production is the cranberry fruitworm, Acrobasis v...

  1. Predictive Models and Computational Embryology

    Science.gov (United States)

    EPA’s ‘virtual embryo’ project is building an integrative systems biology framework for predictive models of developmental toxicity. One schema involves a knowledge-driven adverse outcome pathway (AOP) framework utilizing information from public databases, standardized ontologies...

  2. Statistical methods for mechanistic model validation: Salt Repository Project

    International Nuclear Information System (INIS)

    Eggett, D.L.

    1988-07-01

    As part of the Department of Energy's Salt Repository Program, Pacific Northwest Laboratory (PNL) is studying the emplacement of nuclear waste containers in a salt repository. One objective of the SRP program is to develop an overall waste package component model which adequately describes such phenomena as container corrosion, waste form leaching, spent fuel degradation, etc., which are possible in the salt repository environment. The form of this model will be proposed, based on scientific principles and relevant salt repository conditions with supporting data. The model will be used to predict the future characteristics of the near field environment. This involves several different submodels such as the amount of time it takes a brine solution to contact a canister in the repository, how long it takes a canister to corrode and expose its contents to the brine, the leach rate of the contents of the canister, etc. These submodels are often tested in a laboratory and should be statistically validated (in this context, validate means to demonstrate that the model adequately describes the data) before they can be incorporated into the waste package component model. This report describes statistical methods for validating these models. 13 refs., 1 fig., 3 tabs

  3. Predictive Modeling in Race Walking

    Directory of Open Access Journals (Sweden)

    Krzysztof Wiktorowicz

    2015-01-01

    Full Text Available This paper presents the use of linear and nonlinear multivariable models as tools to support training process of race walkers. These models are calculated using data collected from race walkers’ training events and they are used to predict the result over a 3 km race based on training loads. The material consists of 122 training plans for 21 athletes. In order to choose the best model leave-one-out cross-validation method is used. The main contribution of the paper is to propose the nonlinear modifications for linear models in order to achieve smaller prediction error. It is shown that the best model is a modified LASSO regression with quadratic terms in the nonlinear part. This model has the smallest prediction error and simplified structure by eliminating some of the predictors.

  4. Models for predicting compressive strength and water absorption of ...

    African Journals Online (AJOL)

    This work presents a mathematical model for predicting the compressive strength and water absorption of laterite-quarry dust cement block using augmented Scheffe's simplex lattice design. The statistical models developed can predict the mix proportion that will yield the desired property. The models were tested for lack of ...

  5. A statistical approach to the prediction of pressure tube fracture toughness

    International Nuclear Information System (INIS)

    Pandey, M.D.; Radford, D.D.

    2008-01-01

    The fracture toughness of the zirconium alloy (Zr-2.5Nb) is an important parameter in determining the flaw tolerance for operation of pressure tubes in a nuclear reactor. Fracture toughness data have been generated by performing rising pressure burst tests on sections of pressure tubes removed from operating reactors. The test data were used to generate a lower-bound fracture toughness curve, which is used in defining the operational limits of pressure tubes. The paper presents a comprehensive statistical analysis of burst test data and develops a multivariate statistical model to relate toughness with material chemistry, mechanical properties, and operational history. The proposed model can be useful in predicting fracture toughness of specific in-service pressure tubes, thereby minimizing conservatism associated with a generic lower-bound approach

  6. Predicting future protection of respirator users: Statistical approaches and practical implications.

    Science.gov (United States)

    Hu, Chengcheng; Harber, Philip; Su, Jing

    2016-01-01

    The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.

  7. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    International Nuclear Information System (INIS)

    Weathers, J.B.; Luck, R.; Weathers, J.W.

    2009-01-01

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  8. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    Energy Technology Data Exchange (ETDEWEB)

    Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com

    2009-11-15

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  9. Connection between weighted LPC and higher-order statistics for AR model estimation

    NARCIS (Netherlands)

    Kamp, Y.; Ma, C.

    1993-01-01

    This paper establishes the relationship between a weighted linear prediction method used for robust analysis of voiced speech and the autoregressive modelling based on higher-order statistics, known as cumulants

  10. PGT: A Statistical Approach to Prediction and Mechanism Design

    Science.gov (United States)

    Wolpert, David H.; Bono, James W.

    One of the biggest challenges facing behavioral economics is the lack of a single theoretical framework that is capable of directly utilizing all types of behavioral data. One of the biggest challenges of game theory is the lack of a framework for making predictions and designing markets in a manner that is consistent with the axioms of decision theory. An approach in which solution concepts are distribution-valued rather than set-valued (i.e. equilibrium theory) has both capabilities. We call this approach Predictive Game Theory (or PGT). This paper outlines a general Bayesian approach to PGT. It also presents one simple example to illustrate the way in which this approach differs from equilibrium approaches in both prediction and mechanism design settings.

  11. A statistical model for field emission in superconducting cavities

    International Nuclear Information System (INIS)

    Padamsee, H.; Green, K.; Jost, W.; Wright, B.

    1993-01-01

    A statistical model is used to account for several features of performance of an ensemble of superconducting cavities. The input parameters are: the number of emitters/area, a distribution function for emitter β values, a distribution function for emissive areas, and a processing threshold. The power deposited by emitters is calculated from the field emission current and electron impact energy. The model can successfully account for the fraction of tests that reach the maximum field Epk in an ensemble of cavities, for eg, 1-cells at sign 3 GHz or 5-cells at sign 1.5 GHz. The model is used to predict the level of power needed to successfully process cavities of various surface areas with high pulsed power processing (HPP)

  12. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    Science.gov (United States)

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.

  13. A BRDF statistical model applying to space target materials modeling

    Science.gov (United States)

    Liu, Chenghao; Li, Zhi; Xu, Can; Tian, Qichen

    2017-10-01

    In order to solve the problem of poor effect in modeling the large density BRDF measured data with five-parameter semi-empirical model, a refined statistical model of BRDF which is suitable for multi-class space target material modeling were proposed. The refined model improved the Torrance-Sparrow model while having the modeling advantages of five-parameter model. Compared with the existing empirical model, the model contains six simple parameters, which can approximate the roughness distribution of the material surface, can approximate the intensity of the Fresnel reflectance phenomenon and the attenuation of the reflected light's brightness with the azimuth angle changes. The model is able to achieve parameter inversion quickly with no extra loss of accuracy. The genetic algorithm was used to invert the parameters of 11 different samples in the space target commonly used materials, and the fitting errors of all materials were below 6%, which were much lower than those of five-parameter model. The effect of the refined model is verified by comparing the fitting results of the three samples at different incident zenith angles in 0° azimuth angle. Finally, the three-dimensional modeling visualizations of these samples in the upper hemisphere space was given, in which the strength of the optical scattering of different materials could be clearly shown. It proved the good describing ability of the refined model at the material characterization as well.

  14. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia

    2017-11-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible solutions, and highlight future research directions.

  15. Statistical Challenges in Modeling Big Brain Signals

    KAUST Repository

    Yu, Zhaoxia; Pluta, Dustin; Shen, Tong; Chen, Chuansheng; Xue, Gui; Ombao, Hernando

    2017-01-01

    Brain signal data are inherently big: massive in amount, complex in structure, and high in dimensions. These characteristics impose great challenges for statistical inference and learning. Here we review several key challenges, discuss possible

  16. Statistical Learning Theory: Models, Concepts, and Results

    OpenAIRE

    von Luxburg, Ulrike; Schoelkopf, Bernhard

    2008-01-01

    Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We target at a broad audience, not necessarily machine learning researchers. This paper can serve as a starting point for people who want to get an overview on the field before diving into technical details.

  17. Online Statistical Modeling (Regression Analysis) for Independent Responses

    Science.gov (United States)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  18. Statistical analysis of accurate prediction of local atmospheric optical attenuation with a new model according to weather together with beam wandering compensation system: a season-wise experimental investigation

    Science.gov (United States)

    Arockia Bazil Raj, A.; Padmavathi, S.

    2016-07-01

    Atmospheric parameters strongly affect the performance of Free Space Optical Communication (FSOC) system when the optical wave is propagating through the inhomogeneous turbulent medium. Developing a model to get an accurate prediction of optical attenuation according to meteorological parameters becomes significant to understand the behaviour of FSOC channel during different seasons. A dedicated free space optical link experimental set-up is developed for the range of 0.5 km at an altitude of 15.25 m. The diurnal profile of received power and corresponding meteorological parameters are continuously measured using the developed optoelectronic assembly and weather station, respectively, and stored in a data logging computer. Measured meteorological parameters (as input factors) and optical attenuation (as response factor) of size [177147 × 4] are used for linear regression analysis and to design the mathematical model that is more suitable to predict the atmospheric optical attenuation at our test field. A model that exhibits the R2 value of 98.76% and average percentage deviation of 1.59% is considered for practical implementation. The prediction accuracy of the proposed model is investigated along with the comparative results obtained from some of the existing models in terms of Root Mean Square Error (RMSE) during different local seasons in one-year period. The average RMSE value of 0.043-dB/km is obtained in the longer range dynamic of meteorological parameters variations.

  19. Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

    Science.gov (United States)

    Müller, M. F.; Thompson, S. E.

    2015-09-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.

  20. Foundations of Complex Systems Nonlinear Dynamics, Statistical Physics, and Prediction

    CERN Document Server

    Nicolis, Gregoire

    2007-01-01

    Complexity is emerging as a post-Newtonian paradigm for approaching a large body of phenomena of concern at the crossroads of physical, engineering, environmental, life and human sciences from a unifying point of view. This book outlines the foundations of modern complexity research as it arose from the cross-fertilization of ideas and tools from nonlinear science, statistical physics and numerical simulation. It is shown how these developments lead to an understanding, both qualitative and quantitative, of the complex systems encountered in nature and in everyday experience and, conversely, h

  1. Hydrogen-bond coordination in organic crystal structures: statistics, predictions and applications.

    Science.gov (United States)

    Galek, Peter T A; Chisholm, James A; Pidcock, Elna; Wood, Peter A

    2014-02-01

    Statistical models to predict the number of hydrogen bonds that might be formed by any donor or acceptor atom in a crystal structure have been derived using organic structures in the Cambridge Structural Database. This hydrogen-bond coordination behaviour has been uniquely defined for more than 70 unique atom types, and has led to the development of a methodology to construct hypothetical hydrogen-bond arrangements. Comparing the constructed hydrogen-bond arrangements with known crystal structures shows promise in the assessment of structural stability, and some initial examples of industrially relevant polymorphs, co-crystals and hydrates are described.

  2. Prediction of noise in ships by the application of “statistical energy analysis.”

    DEFF Research Database (Denmark)

    Jensen, John Ødegaard

    1979-01-01

    If it will be possible effectively to reduce the noise level in the accomodation on board ships, by introducing appropriate noise abatement measures already at an early design stage, it is quite essential that sufficiently accurate prediction methods are available for the naval architects...... or for a special noise abatement measure, e.g., increased structural damping. The paper discusses whether it might be possible to derive an alternative calculation model based on the “statistical energy analysis” approach (SEA). By considering the hull of a ship to be constructed from plate elements connected...

  3. Prostate Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing prostate cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  4. Colorectal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing colorectal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  5. Esophageal Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing esophageal cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  6. Bladder Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing bladder cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  7. Lung Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing lung cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  8. Breast Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing breast cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  9. Pancreatic Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing pancreatic cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  10. Ovarian Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing ovarian cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  11. Liver Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing liver cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  12. Testicular Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of testicular cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  13. Cervical Cancer Risk Prediction Models

    Science.gov (United States)

    Developing statistical models that estimate the probability of developing cervical cancer over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.

  14. Integer Set Compression and Statistical Modeling

    DEFF Research Database (Denmark)

    Larsson, N. Jesper

    2014-01-01

    enumeration of elements may be arbitrary or random, but where statistics is kept in order to estimate probabilities of elements. We present a recursive subset-size encoding method that is able to benefit from statistics, explore the effects of permuting the enumeration order based on element probabilities......Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher compression performance. In this work, we address the case where...

  15. Prediction of Chemical Function: Model Development and Application

    Science.gov (United States)

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...

  16. REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH

    Directory of Open Access Journals (Sweden)

    R. SATISHKUMAR

    2017-01-01

    Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.

  17. Statistical modelling for social researchers principles and practice

    CERN Document Server

    Tarling, Roger

    2008-01-01

    This book explains the principles and theory of statistical modelling in an intelligible way for the non-mathematical social scientist looking to apply statistical modelling techniques in research. The book also serves as an introduction for those wishing to develop more detailed knowledge and skills in statistical modelling. Rather than present a limited number of statistical models in great depth, the aim is to provide a comprehensive overview of the statistical models currently adopted in social research, in order that the researcher can make appropriate choices and select the most suitable model for the research question to be addressed. To facilitate application, the book also offers practical guidance and instruction in fitting models using SPSS and Stata, the most popular statistical computer software which is available to most social researchers. Instruction in using MLwiN is also given. Models covered in the book include; multiple regression, binary, multinomial and ordered logistic regression, log-l...

  18. Stochastic Spatial Models in Ecology: A Statistical Physics Approach

    Science.gov (United States)

    Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.

    2017-11-01

    Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.

  19. Glass viscosity calculation based on a global statistical modelling approach

    Energy Technology Data Exchange (ETDEWEB)

    Fluegel, Alex

    2007-02-01

    A global statistical glass viscosity model was developed for predicting the complete viscosity curve, based on more than 2200 composition-property data of silicate glasses from the scientific literature, including soda-lime-silica container and float glasses, TV panel glasses, borosilicate fiber wool and E type glasses, low expansion borosilicate glasses, glasses for nuclear waste vitrification, lead crystal glasses, binary alkali silicates, and various further compositions from over half a century. It is shown that within a measurement series from a specific laboratory the reported viscosity values are often over-estimated at higher temperatures due to alkali and boron oxide evaporation during the measurement and glass preparation, including data by Lakatos et al. (1972) and the recently published High temperature glass melt property database for process modeling by Seward et al. (2005). Similarly, in the glass transition range many experimental data of borosilicate glasses are reported too high due to phase separation effects. The developed global model corrects those errors. The model standard error was 9-17°C, with R^2 = 0.985-0.989. The prediction 95% confidence interval for glass in mass production largely depends on the glass composition of interest, the composition uncertainty, and the viscosity level. New insights in the mixed-alkali effect are provided.

  20. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    Directory of Open Access Journals (Sweden)

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  1. Linear Mixed Models in Statistical Genetics

    NARCIS (Netherlands)

    R. de Vlaming (Ronald)

    2017-01-01

    markdownabstractOne of the goals of statistical genetics is to elucidate the genetic architecture of phenotypes (i.e., observable individual characteristics) that are affected by many genetic variants (e.g., single-nucleotide polymorphisms; SNPs). A particular aim is to identify specific SNPs that

  2. Statistical models and methods for reliability and survival analysis

    CERN Document Server

    Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo

    2013-01-01

    Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical

  3. Statistical shape modeling based renal volume measurement using tracked ultrasound

    Science.gov (United States)

    Pai Raikar, Vipul; Kwartowitz, David M.

    2017-03-01

    Autosomal dominant polycystic kidney disease (ADPKD) is the fourth most common cause of kidney transplant worldwide accounting for 7-10% of all cases. Although ADPKD usually progresses over many decades, accurate risk prediction is an important task.1 Identifying patients with progressive disease is vital to providing new treatments being developed and enable them to enter clinical trials for new therapy. Among other factors, total kidney volume (TKV) is a major biomarker predicting the progression of ADPKD. Consortium for Radiologic Imaging Studies in Polycystic Kidney Disease (CRISP)2 have shown that TKV is an early, and accurate measure of cystic burden and likely growth rate. It is strongly associated with loss of renal function.3 While ultrasound (US) has proven as an excellent tool for diagnosing the disease; monitoring short-term changes using ultrasound has been shown to not be accurate. This is attributed to high operator variability and reproducibility as compared to tomographic modalities such as CT and MR (Gold standard). Ultrasound has emerged as one of the standout modality for intra-procedural imaging and with methods for spatial localization has afforded us the ability to track 2D ultrasound in physical space which it is being used. In addition to this, the vast amount of recorded tomographic data can be used to generate statistical shape models that allow us to extract clinical value from archived image sets. In this work, we aim at improving the prognostic value of US in managing ADPKD by assessing the accuracy of using statistical shape model augmented US data, to predict TKV, with the end goal of monitoring short-term changes.

  4. Prediction of monthly average global solar radiation based on statistical distribution of clearness index

    International Nuclear Information System (INIS)

    Ayodele, T.R.; Ogunjuyigbe, A.S.O.

    2015-01-01

    In this paper, probability distribution of clearness index is proposed for the prediction of global solar radiation. First, the clearness index is obtained from the past data of global solar radiation, then, the parameters of the appropriate distribution that best fit the clearness index are determined. The global solar radiation is thereafter predicted from the clearness index using inverse transformation of the cumulative distribution function. To validate the proposed method, eight years global solar radiation data (2000–2007) of Ibadan, Nigeria are used to determine the parameters of appropriate probability distribution for clearness index. The calculated parameters are then used to predict the future monthly average global solar radiation for the following year (2008). The predicted values are compared with the measured values using four statistical tests: the Root Mean Square Error (RMSE), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error) and the coefficient of determination (R"2). The proposed method is also compared to the existing regression models. The results show that logistic distribution provides the best fit for clearness index of Ibadan and the proposed method is effective in predicting the monthly average global solar radiation with overall RMSE of 0.383 MJ/m"2/day, MAE of 0.295 MJ/m"2/day, MAPE of 2% and R"2 of 0.967. - Highlights: • Distribution of clearnes index is proposed for prediction of global solar radiation. • The clearness index is obtained from the past data of global solar radiation. • The parameters of distribution that best fit the clearness index are determined. • Solar radiation is predicted from the clearness index using inverse transformation. • The method is effective in predicting the monthly average global solar radiation.

  5. Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

    Science.gov (United States)

    Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.

    2016-04-01

    The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.

  6. A Statistical Approach For Modeling Tropical Cyclones. Synthetic Hurricanes Generator Model

    Energy Technology Data Exchange (ETDEWEB)

    Pasqualini, Donatella [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-05-11

    This manuscript brie y describes a statistical ap- proach to generate synthetic tropical cyclone tracks to be used in risk evaluations. The Synthetic Hur- ricane Generator (SynHurG) model allows model- ing hurricane risk in the United States supporting decision makers and implementations of adaptation strategies to extreme weather. In the literature there are mainly two approaches to model hurricane hazard for risk prediction: deterministic-statistical approaches, where the storm key physical parameters are calculated using physi- cal complex climate models and the tracks are usually determined statistically from historical data; and sta- tistical approaches, where both variables and tracks are estimated stochastically using historical records. SynHurG falls in the second category adopting a pure stochastic approach.

  7. Towards a Statistical Model of Tropical Cyclone Genesis

    Science.gov (United States)

    Fernandez, A.; Kashinath, K.; McAuliffe, J.; Prabhat, M.; Stark, P. B.; Wehner, M. F.

    2017-12-01

    Tropical Cyclones (TCs) are important extreme weather phenomena that have a strong impact on humans. TC forecasts are largely based on global numerical models that produce TC-like features. Aspects of Tropical Cyclones such as their formation/genesis, evolution, intensification and dissipation over land are important and challenging problems in climate science. This study investigates the environmental conditions associated with Tropical Cyclone Genesis (TCG) by testing how accurately a statistical model can predict TCG in the CAM5.1 climate model. TCG events are defined using TECA software @inproceedings{Prabhat2015teca, title={TECA: Petascale Pattern Recognition for Climate Science}, author={Prabhat and Byna, Surendra and Vishwanath, Venkatram and Dart, Eli and Wehner, Michael and Collins, William D}, booktitle={Computer Analysis of Images and Patterns}, pages={426-436}, year={2015}, organization={Springer}} to extract TC trajectories from CAM5.1. L1-regularized logistic regression (L1LR) is applied to the CAM5.1 output. The predictions have nearly perfect accuracy for data not associated with TC tracks and high accuracy differentiating between high vorticity and low vorticity systems. The model's active variables largely correspond to current hypotheses about important factors for TCG, such as wind field patterns and local pressure minima, and suggests new routes for investigation. Furthermore, our model's predictions of TC activity are competitive with the output of an instantaneous version of Emanuel and Nolan's Genesis Potential Index (GPI) @inproceedings{eman04, title = "Tropical cyclone activity and the global climate system", author = "Kerry Emanuel and Nolan, {David S.}", year = "2004", pages = "240-241", booktitle = "26th Conference on Hurricanes and Tropical Meteorology"}.

  8. MASKED AREAS IN SHEAR PEAK STATISTICS: A FORWARD MODELING APPROACH

    International Nuclear Information System (INIS)

    Bard, D.; Kratochvil, J. M.; Dawson, W.

    2016-01-01

    The statistics of shear peaks have been shown to provide valuable cosmological information beyond the power spectrum, and will be an important constraint of models of cosmology in forthcoming astronomical surveys. Surveys include masked areas due to bright stars, bad pixels etc., which must be accounted for in producing constraints on cosmology from shear maps. We advocate a forward-modeling approach, where the impacts of masking and other survey artifacts are accounted for in the theoretical prediction of cosmological parameters, rather than correcting survey data to remove them. We use masks based on the Deep Lens Survey, and explore the impact of up to 37% of the survey area being masked on LSST and DES-scale surveys. By reconstructing maps of aperture mass the masking effect is smoothed out, resulting in up to 14% smaller statistical uncertainties compared to simply reducing the survey area by the masked area. We show that, even in the presence of large survey masks, the bias in cosmological parameter estimation produced in the forward-modeling process is ≈1%, dominated by bias caused by limited simulation volume. We also explore how this potential bias scales with survey area and evaluate how much small survey areas are impacted by the differences in cosmological structure in the data and simulated volumes, due to cosmic variance

  9. An Intelligent Model for Stock Market Prediction

    Directory of Open Access Journals (Sweden)

    IbrahimM. Hamed

    2012-08-01

    Full Text Available This paper presents an intelligent model for stock market signal prediction using Multi-Layer Perceptron (MLP Artificial Neural Networks (ANN. Blind source separation technique, from signal processing, is integrated with the learning phase of the constructed baseline MLP ANN to overcome the problems of prediction accuracy and lack of generalization. Kullback Leibler Divergence (KLD is used, as a learning algorithm, because it converges fast and provides generalization in the learning mechanism. Both accuracy and efficiency of the proposed model were confirmed through the Microsoft stock, from wall-street market, and various data sets, from different sectors of the Egyptian stock market. In addition, sensitivity analysis was conducted on the various parameters of the model to ensure the coverage of the generalization issue. Finally, statistical significance was examined using ANOVA test.

  10. Geometric modeling in probability and statistics

    CERN Document Server

    Calin, Ovidiu

    2014-01-01

    This book covers topics of Informational Geometry, a field which deals with the differential geometric study of the manifold probability density functions. This is a field that is increasingly attracting the interest of researchers from many different areas of science, including mathematics, statistics, geometry, computer science, signal processing, physics and neuroscience. It is the authors’ hope that the present book will be a valuable reference for researchers and graduate students in one of the aforementioned fields. This textbook is a unified presentation of differential geometry and probability theory, and constitutes a text for a course directed at graduate or advanced undergraduate students interested in applications of differential geometry in probability and statistics. The book contains over 100 proposed exercises meant to help students deepen their understanding, and it is accompanied by software that is able to provide numerical computations of several information geometric objects. The reader...

  11. Prediction and reconstruction of future and missing unobservable modified Weibull lifetime based on generalized order statistics

    Directory of Open Access Journals (Sweden)

    Amany E. Aly

    2016-04-01

    Full Text Available When a system consisting of independent components of the same type, some appropriate actions may be done as soon as a portion of them have failed. It is, therefore, important to be able to predict later failure times from earlier ones. One of the well-known failure distributions commonly used to model component life, is the modified Weibull distribution (MWD. In this paper, two pivotal quantities are proposed to construct prediction intervals for future unobservable lifetimes based on generalized order statistics (gos from MWD. Moreover, a pivotal quantity is developed to reconstruct missing observations at the beginning of experiment. Furthermore, Monte Carlo simulation studies are conducted and numerical computations are carried out to investigate the efficiency of presented results. Finally, two illustrative examples for real data sets are analyzed.

  12. Challenges in dental statistics: data and modelling

    OpenAIRE

    Matranga, D.; Castiglia, P.; Solinas, G.

    2013-01-01

    The aim of this work is to present the reflections and proposals derived from the first Workshop of the SISMEC STATDENT working group on statistical methods and applications in dentistry, held in Ancona (Italy) on 28th September 2011. STATDENT began as a forum of comparison and discussion for statisticians working in the field of dental research in order to suggest new and improve existing biostatistical and clinical epidemiological methods. During the meeting, we dealt with very important to...

  13. A statistical model of future human actions

    International Nuclear Information System (INIS)

    Woo, G.

    1992-02-01

    A critical review has been carried out of models of future human actions during the long term post-closure period of a radioactive waste repository. Various Markov models have been considered as alternatives to the standard Poisson model, and the problems of parameterisation have been addressed. Where the simplistic Poisson model unduly exaggerates the intrusion risk, some form of Markov model may have to be introduced. This situation may well arise for shallow repositories, but it is less likely for deep repositories. Recommendations are made for a practical implementation of a computer based model and its associated database. (Author)

  14. Enhanced surrogate models for statistical design exploiting space mapping technology

    DEFF Research Database (Denmark)

    Koziel, Slawek; Bandler, John W.; Mohamed, Achmed S.

    2005-01-01

    We present advances in microwave and RF device modeling exploiting Space Mapping (SM) technology. We propose new SM modeling formulations utilizing input mappings, output mappings, frequency scaling and quadratic approximations. Our aim is to enhance circuit models for statistical analysis...

  15. Statistical models of shape optimisation and evaluation

    CERN Document Server

    Davies, Rhodri; Taylor, Chris

    2014-01-01

    Deformable shape models have wide application in computer vision and biomedical image analysis. This book addresses a key issue in shape modelling: establishment of a meaningful correspondence between a set of shapes. Full implementation details are provided.

  16. The proposed 'concordance-statistic for benefit' provided a useful metric when modeling heterogeneous treatment effects.

    Science.gov (United States)

    van Klaveren, David; Steyerberg, Ewout W; Serruys, Patrick W; Kent, David M

    2018-02-01

    Clinical prediction models that support treatment decisions are usually evaluated for their ability to predict the risk of an outcome rather than treatment benefit-the difference between outcome risk with vs. without therapy. We aimed to define performance metrics for a model's ability to predict treatment benefit. We analyzed data of the Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery (SYNTAX) trial and of three recombinant tissue plasminogen activator trials. We assessed alternative prediction models with a conventional risk concordance-statistic (c-statistic) and a novel c-statistic for benefit. We defined observed treatment benefit by the outcomes in pairs of patients matched on predicted benefit but discordant for treatment assignment. The 'c-for-benefit' represents the probability that from two randomly chosen matched patient pairs with unequal observed benefit, the pair with greater observed benefit also has a higher predicted benefit. Compared to a model without treatment interactions, the SYNTAX score II had improved ability to discriminate treatment benefit (c-for-benefit 0.590 vs. 0.552), despite having similar risk discrimination (c-statistic 0.725 vs. 0.719). However, for the simplified stroke-thrombolytic predictive instrument (TPI) vs. the original stroke-TPI, the c-for-benefit (0.584 vs. 0.578) was similar. The proposed methodology has the potential to measure a model's ability to predict treatment benefit not captured with conventional performance metrics. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. How to practise Bayesian statistics outside the Bayesian church: What philosophy for Bayesian statistical modelling?

    NARCIS (Netherlands)

    Borsboom, D.; Haig, B.D.

    2013-01-01

    Unlike most other statistical frameworks, Bayesian statistical inference is wedded to a particular approach in the philosophy of science (see Howson & Urbach, 2006); this approach is called Bayesianism. Rather than being concerned with model fitting, this position in the philosophy of science

  18. Statistical Tests for Mixed Linear Models

    CERN Document Server

    Khuri, André I; Sinha, Bimal K

    2011-01-01

    An advanced discussion of linear models with mixed or random effects. In recent years a breakthrough has occurred in our ability to draw inferences from exact and optimum tests of variance component models, generating much research activity that relies on linear models with mixed and random effects. This volume covers the most important research of the past decade as well as the latest developments in hypothesis testing. It compiles all currently available results in the area of exact and optimum tests for variance component models and offers the only comprehensive treatment for these models a

  19. Statistical modelling of traffic safety development

    DEFF Research Database (Denmark)

    Christens, Peter

    2004-01-01

    there were 6861 injury trafficc accidents reported by the police, resulting in 4519 minor injuries, 3946 serious injuries, and 431 fatalities. The general purpose of the research was to improve the insight into aggregated road safety methodology in Denmark. The aim was to analyse advanced statistical methods......, that were designed to study developments over time, including effects of interventions. This aim has been achieved by investigating variations in aggregated Danish traffic accident series and by applying state of the art methodologies to specific case studies. The thesis comprises an introduction...

  20. Statistical image processing and multidimensional modeling

    CERN Document Server

    Fieguth, Paul

    2010-01-01

    Images are all around us! The proliferation of low-cost, high-quality imaging devices has led to an explosion in acquired images. When these images are acquired from a microscope, telescope, satellite, or medical imaging device, there is a statistical image processing task: the inference of something - an artery, a road, a DNA marker, an oil spill - from imagery, possibly noisy, blurry, or incomplete. A great many textbooks have been written on image processing. However this book does not so much focus on images, per se, but rather on spatial data sets, with one or more measurements taken over

  1. Fluctuations and correlations in statistical models of hadron production

    International Nuclear Information System (INIS)

    Gorenstein, M. I.

    2012-01-01

    An extension of the standard concept of the statistical ensembles is suggested. Namely, the statistical ensembles with extensive quantities fluctuating according to an externally given distribution are introduced. Applications in the statistical models of multiple hadron production in high energy physics are discussed.

  2. Analysis and Evaluation of Statistical Models for Integrated Circuits Design

    Directory of Open Access Journals (Sweden)

    Sáenz-Noval J.J.

    2011-10-01

    Full Text Available Statistical models for integrated circuits (IC allow us to estimate the percentage of acceptable devices in the batch before fabrication. Actually, Pelgrom is the statistical model most accepted in the industry; however it was derived from a micrometer technology, which does not guarantee reliability in nanometric manufacturing processes. This work considers three of the most relevant statistical models in the industry and evaluates their limitations and advantages in analog design, so that the designer has a better criterion to make a choice. Moreover, it shows how several statistical models can be used for each one of the stages and design purposes.

  3. Modeling of uncertainties in statistical inverse problems

    International Nuclear Information System (INIS)

    Kaipio, Jari

    2008-01-01

    In all real world problems, the models that tie the measurements to the unknowns of interest, are at best only approximations for reality. While moderate modeling and approximation errors can be tolerated with stable problems, inverse problems are a notorious exception. Typical modeling errors include inaccurate geometry, unknown boundary and initial data, properties of noise and other disturbances, and simply the numerical approximations of the physical models. In principle, the Bayesian approach to inverse problems, in which all uncertainties are modeled as random variables, is capable of handling these uncertainties. Depending on the type of uncertainties, however, different strategies may be adopted. In this paper we give an overview of typical modeling errors and related strategies within the Bayesian framework.

  4. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  5. Statistical modeling and extrapolation of carcinogenesis data

    International Nuclear Information System (INIS)

    Krewski, D.; Murdoch, D.; Dewanji, A.

    1986-01-01

    Mathematical models of carcinogenesis are reviewed, including pharmacokinetic models for metabolic activation of carcinogenic substances. Maximum likelihood procedures for fitting these models to epidemiological data are discussed, including situations where the time to tumor occurrence is unobservable. The plausibility of different possible shapes of the dose response curve at low doses is examined, and a robust method for linear extrapolation to low doses is proposed and applied to epidemiological data on radiation carcinogenesis

  6. Plan Recognition using Statistical Relational Models

    Science.gov (United States)

    2014-08-25

    corresponding undirected model can be significantly more complex since there is no closed form solution for the maximum-likelihood set of parameters unlike in...algorithm did not scale to larger training sets, and the overall results are still not competitive with BALPs. 5In directed models, a closed form solution...opinions of ARO, DARPA, NSF or any other government agency. References Albrecht DW, Zukerman I, Nicholson AE. Bayesian models for keyhole plan

  7. Statistical Models to Assess the Health Effects and to Forecast Ground Level Ozone

    Czech Academy of Sciences Publication Activity Database

    Schlink, U.; Herbath, O.; Richter, M.; Dorling, S.; Nunnari, G.; Cawley, G.; Pelikán, Emil

    2006-01-01

    Roč. 21, č. 4 (2006), s. 547-558 ISSN 1364-8152 R&D Projects: GA AV ČR 1ET400300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : statistical models * ground level ozone * health effects * logistic model * forecasting * prediction performance * neural network * generalised additive model * integrated assessment Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.992, year: 2006

  8. Multivariate statistical modelling based on generalized linear models

    CERN Document Server

    Fahrmeir, Ludwig

    1994-01-01

    This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...

  9. Shape-correlated deformation statistics for respiratory motion prediction in 4D lung

    Science.gov (United States)

    Liu, Xiaoxiao; Oguz, Ipek; Pizer, Stephen M.; Mageras, Gig S.

    2010-02-01

    4D image-guided radiation therapy (IGRT) for free-breathing lungs is challenging due to the complicated respiratory dynamics. Effective modeling of respiratory motion is crucial to account for the motion affects on the dose to tumors. We propose a shape-correlated statistical model on dense image deformations for patient-specic respiratory motion estimation in 4D lung IGRT. Using the shape deformations of the high-contrast lungs as the surrogate, the statistical model trained from the planning CTs can be used to predict the image deformation during delivery verication time, with the assumption that the respiratory motion at both times are similar for the same patient. Dense image deformation fields obtained by diffeomorphic image registrations characterize the respiratory motion within one breathing cycle. A point-based particle optimization algorithm is used to obtain the shape models of lungs with group-wise surface correspondences. Canonical correlation analysis (CCA) is adopted in training to maximize the linear correlation between the shape variations of the lungs and the corresponding dense image deformations. Both intra- and inter-session CT studies are carried out on a small group of lung cancer patients and evaluated in terms of the tumor location accuracies. The results suggest potential applications using the proposed method.

  10. Statistical learning for predictive targeting in online advertising

    DEFF Research Database (Denmark)

    Fruergaard, Bjarne Ørum

    The focus in this thesis is investigation of machine learning methods with applications in computational advertising. Computational advertising is the broad discipline of building systems which can reach audiences browsing the Internet with targeted advertisements. At the core of such systems......, an international online advertising technology partner. This also means that the analyses and methods in this work are developed with particular use-cases within Adform in mind and thus need also to be applicable in Adform’s technology stack. This implies extra thought on scalability and performance...... application in real-time bidding ad exchanges, where each advertiser is given a chance to place bids for showing their ad while the page loads, and the winning bid gets to display their banner. The contributions of this thesis entail application of a hybrid model of explicit and latent features for learning...

  11. Statistical Modelling of Extreme Rainfall in Taiwan

    NARCIS (Netherlands)

    L-F. Chu (Lan-Fen); M.J. McAleer (Michael); C-C. Chang (Ching-Chung)

    2012-01-01

    textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.

  12. Statistical Modelling of Extreme Rainfall in Taiwan

    NARCIS (Netherlands)

    L. Chu (LanFen); M.J. McAleer (Michael); C-H. Chang (Chu-Hsiang)

    2013-01-01

    textabstractIn this paper, the annual maximum daily rainfall data from 1961 to 2010 are modelled for 18 stations in Taiwan. We fit the rainfall data with stationary and non-stationary generalized extreme value distributions (GEV), and estimate their future behaviour based on the best fitting model.

  13. A stepwise model to predict monthly streamflow

    Science.gov (United States)

    Mahmood Al-Juboori, Anas; Guven, Aytac

    2016-12-01

    In this study, a stepwise model empowered with genetic programming is developed to predict the monthly flows of Hurman River in Turkey and Diyalah and Lesser Zab Rivers in Iraq. The model divides the monthly flow data to twelve intervals representing the number of months in a year. The flow of a month, t is considered as a function of the antecedent month's flow (t - 1) and it is predicted by multiplying the antecedent monthly flow by a constant value called K. The optimum value of K is obtained by a stepwise procedure which employs Gene Expression Programming (GEP) and Nonlinear Generalized Reduced Gradient Optimization (NGRGO) as alternative to traditional nonlinear regression technique. The degree of determination and root mean squared error are used to evaluate the performance of the proposed models. The results of the proposed model are compared with the conventional Markovian and Auto Regressive Integrated Moving Average (ARIMA) models based on observed monthly flow data. The comparison results based on five different statistic measures show that the proposed stepwise model performed better than Markovian model and ARIMA model. The R2 values of the proposed model range between 0.81 and 0.92 for the three rivers in this study.

  14. Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

    Science.gov (United States)

    White, H; Racine, J

    2001-01-01

    We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.

  15. An improved mixing model providing joint statistics of scalar and scalar dissipation

    Energy Technology Data Exchange (ETDEWEB)

    Meyer, Daniel W. [Department of Energy Resources Engineering, Stanford University, Stanford, CA (United States); Jenny, Patrick [Institute of Fluid Dynamics, ETH Zurich (Switzerland)

    2008-11-15

    For the calculation of nonpremixed turbulent flames with thin reaction zones the joint probability density function (PDF) of the mixture fraction and its dissipation rate plays an important role. The corresponding PDF transport equation involves a mixing model for the closure of the molecular mixing term. Here, the parameterized scalar profile (PSP) mixing model is extended to provide the required joint statistics. Model predictions are validated using direct numerical simulation (DNS) data of a passive scalar mixing in a statistically homogeneous turbulent flow. Comparisons between the DNS and the model predictions are provided, which involve different initial scalar-field lengthscales. (author)

  16. Steady state statistical correlations predict bistability in reaction motifs.

    Science.gov (United States)

    Chakravarty, Suchana; Barik, Debashis

    2017-03-28

    Various cellular decision making processes are regulated by bistable switches that take graded input signals and convert them to binary all-or-none responses. Traditionally, a bistable switch generated by a positive feedback loop is characterized either by a hysteretic signal response curve with two distinct signaling thresholds or by characterizing the bimodality of the response distribution in the bistable region. To identify the intrinsic bistability of a feedback regulated network, here we propose that bistability can be determined by correlating higher order moments and cumulants (≥2) of the joint steady state distributions of two components connected in a positive feedback loop. We performed stochastic simulations of four feedback regulated models with intrinsic bistability and we show that for a bistable switch with variation of the signal dose, the steady state variance vs. covariance adopts a signatory cusp-shaped curve. Further, we find that the (n + 1)th order cross-cumulant vs. nth order cross-cumulant adopts a closed loop structure for at least n = 3. We also propose that our method is capable of identifying systems without intrinsic bistability even though the system may show bimodality in the marginal response distribution. The proposed method can be used to analyze single cell protein data measured at steady state from experiments such as flow cytometry.

  17. A generic statistical methodology to predict the maximum pit depth of a localized corrosion process

    International Nuclear Information System (INIS)

    Jarrah, A.; Bigerelle, M.; Guillemot, G.; Najjar, D.; Iost, A.; Nianga, J.-M.

    2011-01-01

    Highlights: → We propose a methodology to predict the maximum pit depth in a corrosion process. → Generalized Lambda Distribution and the Computer Based Bootstrap Method are combined. → GLD fit a large variety of distributions both in their central and tail regions. → Minimum thickness preventing perforation can be estimated with a safety margin. → Considering its applications, this new approach can help to size industrial pieces. - Abstract: This paper outlines a new methodology to predict accurately the maximum pit depth related to a localized corrosion process. It combines two statistical methods: the Generalized Lambda Distribution (GLD), to determine a model of distribution fitting with the experimental frequency distribution of depths, and the Computer Based Bootstrap Method (CBBM), to generate simulated distributions equivalent to the experimental one. In comparison with conventionally established statistical methods that are restricted to the use of inferred distributions constrained by specific mathematical assumptions, the major advantage of the methodology presented in this paper is that both the GLD and the CBBM enable a statistical treatment of the experimental data without making any preconceived choice neither on the unknown theoretical parent underlying distribution of pit depth which characterizes the global corrosion phenomenon nor on the unknown associated theoretical extreme value distribution which characterizes the deepest pits. Considering an experimental distribution of depths of pits produced on an aluminium sample, estimations of maximum pit depth using a GLD model are compared to similar estimations based on usual Gumbel and Generalized Extreme Value (GEV) methods proposed in the corrosion engineering literature. The GLD approach is shown having smaller bias and dispersion in the estimation of the maximum pit depth than the Gumbel approach both for its realization and mean. This leads to comparing the GLD approach to the GEV one

  18. Relative effects of statistical preprocessing and postprocessing on a regional hydrological ensemble prediction system

    Science.gov (United States)

    Sharma, Sanjib; Siddique, Ridwan; Reed, Seann; Ahnert, Peter; Mendoza, Pablo; Mejia, Alfonso

    2018-03-01

    The relative roles of statistical weather preprocessing and streamflow postprocessing in hydrological ensemble forecasting at short- to medium-range forecast lead times (day 1-7) are investigated. For this purpose, a regional hydrologic ensemble prediction system (RHEPS) is developed and implemented. The RHEPS is comprised of the following components: (i) hydrometeorological observations (multisensor precipitation estimates, gridded surface temperature, and gauged streamflow); (ii) weather ensemble forecasts (precipitation and near-surface temperature) from the National Centers for Environmental Prediction 11-member Global Ensemble Forecast System Reforecast version 2 (GEFSRv2); (iii) NOAA's Hydrology Laboratory-Research Distributed Hydrologic Model (HL-RDHM); (iv) heteroscedastic censored logistic regression (HCLR) as the statistical preprocessor; (v) two statistical postprocessors, an autoregressive model with a single exogenous variable (ARX(1,1)) and quantile regression (QR); and (vi) a comprehensive verification strategy. To implement the RHEPS, 1 to 7 days weather forecasts from the GEFSRv2 are used to force HL-RDHM and generate raw ensemble streamflow forecasts. Forecasting experiments are conducted in four nested basins in the US Middle Atlantic region, ranging in size from 381 to 12 362 km2. Results show that the HCLR preprocessed ensemble precipitation forecasts have greater skill than the raw forecasts. These improvements are more noticeable in the warm season at the longer lead times (> 3 days). Both postprocessors, ARX(1,1) and QR, show gains in skill relative to the raw ensemble streamflow forecasts, particularly in the cool season, but QR outperforms ARX(1,1). The scenarios that implement preprocessing and postprocessing separately tend to perform similarly, although the postprocessing-alone scenario is often more effective. The scenario involving both preprocessing and postprocessing consistently outperforms the other scenarios. In some cases

  19. On the Logical Development of Statistical Models.

    Science.gov (United States)

    1983-12-01

    1978). "Modelos con parametros variables en el analisis de series temporales " Questiio, 4, 2, 75-87. [25] Seal, H. L. (1967). "The historical...example, a classical state-space representation of a simple time series model is: yt = it + ut Ut = *It-I + Ct (2.2) ut and et are independent normal...on its past values is displayed in the structural equation. This approach has been particularly useful in time series models. For example, model (2.2

  20. A Noise Robust Statistical Texture Model

    DEFF Research Database (Denmark)

    Hilger, Klaus Baggesen; Stegmann, Mikkel Bille; Larsen, Rasmus

    2002-01-01

    Appearance Models segmentation framework. This is accomplished by augmenting the model with an estimate of the covariance of the noise present in the training data. This results in a more compact model maximising the signal-to-noise ratio, thus favouring subspaces rich on signal, but low on noise......This paper presents a novel approach to the problem of obtaining a low dimensional representation of texture (pixel intensity) variation present in a training set after alignment using a Generalised Procrustes analysis.We extend the conventional analysis of training textures in the Active...

  1. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    Science.gov (United States)

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  2. Predictive modelling of noise level generated during sawing of rocks

    Indian Academy of Sciences (India)

    This paper presents an experimental and statistical study on noise level generated during of rock sawing by circular diamond sawblades. Influence of the operating variables and rock properties on the noise level are investigated and analysed. Statistical analyses are then employed and models are built for the prediction of ...

  3. Physics-based statistical model and simulation method of RF propagation in urban environments

    Science.gov (United States)

    Pao, Hsueh-Yuan; Dvorak, Steven L.

    2010-09-14

    A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.

  4. Statistics

    CERN Document Server

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  5. 12th Workshop on Stochastic Models, Statistics and Their Applications

    CERN Document Server

    Rafajłowicz, Ewaryst; Szajowski, Krzysztof

    2015-01-01

    This volume presents the latest advances and trends in stochastic models and related statistical procedures. Selected peer-reviewed contributions focus on statistical inference, quality control, change-point analysis and detection, empirical processes, time series analysis, survival analysis and reliability, statistics for stochastic processes, big data in technology and the sciences, statistical genetics, experiment design, and stochastic models in engineering. Stochastic models and related statistical procedures play an important part in furthering our understanding of the challenging problems currently arising in areas of application such as the natural sciences, information technology, engineering, image analysis, genetics, energy and finance, to name but a few. This collection arises from the 12th Workshop on Stochastic Models, Statistics and Their Applications, Wroclaw, Poland.

  6. Predicting losing and gaining river reaches in lowland New Zealand based on a statistical methodology

    Science.gov (United States)

    Yang, Jing; Zammit, Christian; Dudley, Bruce

    2017-04-01

    The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.

  7. Corrected Statistical Energy Analysis Model for Car Interior Noise

    Directory of Open Access Journals (Sweden)

    A. Putra

    2015-01-01

    Full Text Available Statistical energy analysis (SEA is a well-known method to analyze the flow of acoustic and vibration energy in a complex structure. For an acoustic space where significant absorptive materials are present, direct field component from the sound source dominates the total sound field rather than a reverberant field, where the latter becomes the basis in constructing the conventional SEA model. Such environment can be found in a car interior and thus a corrected SEA model is proposed here to counter this situation. The model is developed by eliminating the direct field component from the total sound field and only the power after the first reflection is considered. A test car cabin was divided into two subsystems and by using a loudspeaker as a sound source, the power injection method in SEA was employed to obtain the corrected coupling loss factor and the damping loss factor from the corrected SEA model. These parameters were then used to predict the sound pressure level in the interior cabin using the injected input power from the engine. The results show satisfactory agreement with the directly measured SPL.

  8. Latent domain models for statistical machine translation

    NARCIS (Netherlands)

    Hoàng, C.

    2017-01-01

    A data-driven approach to model translation suffers from the data mismatch problem and demands domain adaptation techniques. Given parallel training data originating from a specific domain, training an MT system on the data would result in a rather suboptimal translation for other domains. But does

  9. Behavioral and statistical models of educational inequality

    DEFF Research Database (Denmark)

    Holm, Anders; Breen, Richard

    2016-01-01

    This paper addresses the question of how students and their families make educational decisions. We describe three types of behavioral model that might underlie decision-making and we show that they have consequences for what decisions are made. Our study thus has policy implications if we wish...

  10. Statistical models of global Langmuir mixing

    Science.gov (United States)

    Li, Qing; Fox-Kemper, Baylor; Breivik, Øyvind; Webb, Adrean

    2017-05-01

    The effects of Langmuir mixing on the surface ocean mixing may be parameterized by applying an enhancement factor which depends on wave, wind, and ocean state to the turbulent velocity scale in the K-Profile Parameterization. Diagnosing the appropriate enhancement factor online in global climate simulations is readily achieved by coupling with a prognostic wave model, but with significant computational and code development expenses. In this paper, two alternatives that do not require a prognostic wave model, (i) a monthly mean enhancement factor climatology, and (ii) an approximation to the enhancement factor based on the empirical wave spectra, are explored and tested in a global climate model. Both appear to reproduce the Langmuir mixing effects as estimated using a prognostic wave model, with nearly identical and substantial improvements in the simulated mixed layer depth and intermediate water ventilation over control simulations, but significantly less computational cost. Simpler approaches, such as ignoring Langmuir mixing altogether or setting a globally constant Langmuir number, are found to be deficient. Thus, the consequences of Stokes depth and misaligned wind and waves are important.

  11. Sampling, Probability Models and Statistical Reasoning -RE ...

    Indian Academy of Sciences (India)

    random sampling allows data to be modelled with the help of probability ... g based on different trials to get an estimate of the experimental error. ... research interests lie in the .... if e is indeed the true value of the proportion of defectives in the.

  12. Statistical Model Checking for Product Lines

    DEFF Research Database (Denmark)

    ter Beek, Maurice H.; Legay, Axel; Lluch Lafuente, Alberto

    2016-01-01

    average cost of products (in terms of the attributes of the products’ features) and the probability of features to be (un)installed at runtime. The product lines must be modelled in QFLan, which extends the probabilistic feature-oriented language PFLan with novel quantitative constraints among features...

  13. A Statistical Model for Energy Intensity

    Directory of Open Access Journals (Sweden)

    Marjaneh Issapour

    2012-12-01

    Full Text Available A promising approach to improve scientific literacy in regards to global warming and climate change is using a simulation as part of a science education course. The simulation needs to employ scientific analysis of actual data from internationally accepted and reputable databases to demonstrate the reality of the current climate change situation. One of the most important criteria for using a simulation in a science education course is the fidelity of the model. The realism of the events and consequences modeled in the simulation is significant as well. Therefore, all underlying equations and algorithms used in the simulation must have real-world scientific basis. The "Energy Choices" simulation is one such simulation. The focus of this paper is the development of a mathematical model for "Energy Intensity" as a part of the overall system dynamics in "Energy Choices" simulation. This model will define the "Energy Intensity" as a function of other independent variables that can be manipulated by users of the simulation. The relationship discovered by this research will be applied to an algorithm in the "Energy Choices" simulation.

  14. Structured Statistical Models of Inductive Reasoning

    Science.gov (United States)

    Kemp, Charles; Tenenbaum, Joshua B.

    2009-01-01

    Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet…

  15. Seasonal predictability of Kiremt rainfall in coupled general circulation models

    Science.gov (United States)

    Gleixner, Stephanie; Keenlyside, Noel S.; Demissie, Teferi D.; Counillon, François; Wang, Yiguo; Viste, Ellen

    2017-11-01

    The Ethiopian economy and population is strongly dependent on rainfall. Operational seasonal predictions for the main rainy season (Kiremt, June-September) are based on statistical approaches with Pacific sea surface temperatures (SST) as the main predictor. Here we analyse dynamical predictions from 11 coupled general circulation models for the Kiremt seasons from 1985-2005 with the forecasts starting from the beginning of May. We find skillful predictions from three of the 11 models, but no model beats a simple linear prediction model based on the predicted Niño3.4 indices. The skill of the individual models for dynamically predicting Kiremt rainfall depends on the strength of the teleconnection between Kiremt rainfall and concurrent Pacific SST in the models. Models that do not simulate this teleconnection fail to capture the observed relationship between Kiremt rainfall and the large-scale Walker circulation.

  16. Statistical Analysis and Modelling of Olkiluoto Structures

    International Nuclear Information System (INIS)

    Hellae, P.; Vaittinen, T.; Saksa, P.; Nummela, J.

    2004-11-01

    Posiva Oy is carrying out investigations for the disposal of the spent nuclear fuel at the Olkiluoto site in SW Finland. The investigations have focused on the central part of the island. The layout design of the entire repository requires characterization of notably larger areas and must rely at least at the current stage on borehole information from a rather sparse network and on the geophysical soundings providing information outside and between the holes. In this work, the structural data according to the current version of the Olkiluoto bedrock model is analyzed. The bedrock model relies much on the borehole data although results of the seismic surveys and, for example, pumping tests are used in determining the orientation and continuation of the structures. Especially in the analysis, questions related to the frequency of structures and size of the structures are discussed. The structures observed in the boreholes are mainly dipping gently to the southeast. About 9 % of the sample length belongs to structures. The proportion is higher in the upper parts of the rock. The number of fracture and crushed zones seems not to depend greatly on the depth, whereas the hydraulic features concentrate on the depth range above -100 m. Below level -300 m, the hydraulic conductivity occurs in connection of fractured zones. Especially the hydraulic features, but also fracture and crushed zones often occur in groups. The frequency of the structure (area of structures per total volume) is estimated to be of the order of 1/100m. The size of the local structures was estimated by calculating the intersection of the zone to the nearest borehole where the zone has not been detected. Stochastic models using the Fracman software by Golder Associates were generated based on the bedrock model data complemented with the magnetic ground survey data. The seismic surveys (from boreholes KR5, KR13, KR14, and KR19) were used as alternative input data. The generated models were tested by

  17. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  18. Advanced data analysis in neuroscience integrating statistical and computational models

    CERN Document Server

    Durstewitz, Daniel

    2017-01-01

    This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering.  Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...

  19. Saccadic gain adaptation is predicted by the statistics of natural fluctuations in oculomotor function

    Directory of Open Access Journals (Sweden)

    Mark V Albert

    2012-12-01

    Full Text Available Due to multiple factors such as fatigue, muscle strengthening, and neural plasticity, the responsiveness of the motor apparatus to neural commands changes over time. To enable precise movements the nervous system must adapt to compensate for these changes. Recent models of motor adaptation derive from assumptions about the way the motor apparatus changes. Characterizing these changes is difficult because motor adaptation happens at the same time, masking most of the effects of ongoing changes. Here, we analyze eye movements of monkeys with lesions to the posterior cerebellar vermis that impair adaptation. Their fluctuations better reveal the underlying changes of the motor system over time. When these measured, unadapted changes are used to derive optimal motor adaptation rules the prediction precision significantly improves. Among three models that similarly fit single-day adaptation results, the model that also matches the temporal correlations of the nonadapting saccades most accurately predicts multiple day adaptation. Saccadic gain adaptation is well matched to the natural statistics of fluctuations of the oculomotor plant.

  20. Moment based model predictive control for systems with additive uncertainty

    NARCIS (Netherlands)

    Saltik, M.B.; Ozkan, L.; Weiland, S.; Ludlage, J.H.A.

    2017-01-01

    In this paper, we present a model predictive control (MPC) strategy based on the moments of the state variables and the cost functional. The statistical properties of the state predictions are calculated through the open loop iteration of dynamics and used in the formulation of MPC cost function. We

  1. Statistics Based Models for the Dynamics of Chernivtsi Children Disease

    Directory of Open Access Journals (Sweden)

    Igor G. Nesteruk

    2017-10-01

    Full Text Available Background. Simple mathematical models of contamination and SIR-model of spreading an infection were used to simulate the time dynamics of the unknown before children disease, which occurred in Chernivtsi (Ukraine. The cause of many cases of alopecia, which began in this city in August 1988 is still not fully clarified. According to the official report of the governmental commission, the last new cases occurred in the middle of November 1988, and the reason of the illness was reported as chemical exogenous intoxication. Later this illness became the name “Chernivtsi chemical disease”. Nevertheless, the significantly increased number of new cases of the local alopecia was registered almost three years and is still not clarified. Objective. The comparison of two different versions of the disease: chemical exogenous intoxication and infection. Identification of the parameters of mathematical models and prediction of the disease development. Methods. Analytical solutions of the contamination models and SIR-model for an epidemic are obtained. The optimal values of parameters with the use of linear regression were found. Results. The optimal values of the models parameters with the use of statistical approach were identified. The calculations showed that the infectious version of the disease is more reliable in comparison with the popular contamination one. The possible date of the epidemic beginning was estimated. Conclusions. The optimal parameters of SIR-model allow calculating the realistic number of victims and other characteristics of possible epidemic. They also show that increased number of cases of local alopecia could be a part of the same epidemic as “Chernivtsi chemical disease”.

  2. Statistically Based Morphodynamic Modeling of Tracer Slowdown

    Science.gov (United States)

    Borhani, S.; Ghasemi, A.; Hill, K. M.; Viparelli, E.

    2017-12-01

    Tracer particles are used to study bedload transport in gravel-bed rivers. One of the advantages associated with using of tracer particles is that they allow for direct measures of the entrainment rates and their size distributions. The main issue in large scale studies with tracer particles is the difference between tracer stone short term and long term behavior. This difference is due to the fact that particles undergo vertical mixing or move to less active locations such as bars or even floodplains. For these reasons the average virtual velocity of tracer particle decreases in time, i.e. the tracer slowdown. In summary, tracer slowdown can have a significant impact on the estimation of bedload transport rate or long term dispersal of contaminated sediment. The vast majority of the morphodynamic models that account for the non-uniformity of the bed material (tracer and not tracer, in this case) are based on a discrete description of the alluvial deposit. The deposit is divided in two different regions; the active layer and the substrate. The active layer is a thin layer in the topmost part of the deposit whose particles can interact with the bed material transport. The substrate is the part of the deposit below the active layer. Due to the discrete representation of the alluvial deposit, active layer models are not able to reproduce tracer slowdown. In this study we try to model the slowdown of tracer particles with the continuous Parker-Paola-Leclair morphodynamic framework. This continuous, i.e. not layer-based, framework is based on a stochastic description of the temporal variation of bed surface elevation, and of the elevation specific particle entrainment and deposition. Particle entrainment rates are computed as a function of the flow and sediment characteristics, while particle deposition is estimated with a step length formulation. Here we present one of the first implementation of the continuum framework at laboratory scale, its validation against

  3. Statistical model of hadrons multiple production in space of total angular momentum and isotopic spin

    International Nuclear Information System (INIS)

    Gridneva, S.A.; Rus'kin, V.I.

    1980-01-01

    Basic features of the statistical model of multiple hadron production based on microcanonical distribution and taking into account the laws of conservation of total angular momentum, isotopic spin, p-, G-, C-eveness and Bose-Einstein statistics requirements are given. The model predictions are compared with experimental data on anti NN annihilation at rest and e + e - annihilation in hadrons at annihilation total energy from 2 to 3 GeV [ru

  4. Statistics

    Science.gov (United States)

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  5. The statistical multifragmentation model: Origins and recent advances

    International Nuclear Information System (INIS)

    Donangelo, R.; Souza, S. R.

    2016-01-01

    We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.

  6. The statistical multifragmentation model: Origins and recent advances

    Energy Technology Data Exchange (ETDEWEB)

    Donangelo, R., E-mail: donangel@fing.edu.uy [Instituto de Física, Facultad de Ingeniería, Universidad de la República, Julio Herrera y Reissig 565, 11300, Montevideo (Uruguay); Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Souza, S. R., E-mail: srsouza@if.ufrj.br [Instituto de Física, Universidade Federal do Rio de Janeiro, C.P. 68528, 21941-972 Rio de Janeiro - RJ (Brazil); Instituto de Física, Universidade Federal do Rio Grande do Sul, C.P. 15051, 91501-970 Porto Alegre - RS (Brazil)

    2016-07-07

    We review the Statistical Multifragmentation Model (SMM) which considers a generalization of the liquid-drop model for hot nuclei and allows one to calculate thermodynamic quantities characterizing the nuclear ensemble at the disassembly stage. We show how to determine probabilities of definite partitions of finite nuclei and how to determine, through Monte Carlo calculations, observables such as the caloric curve, multiplicity distributions, heat capacity, among others. Some experimental measurements of the caloric curve confirmed the SMM predictions of over 10 years before, leading to a surge in the interest in the model. However, the experimental determination of the fragmentation temperatures relies on the yields of different isotopic species, which were not correctly calculated in the schematic, liquid-drop picture, employed in the SMM. This led to a series of improvements in the SMM, in particular to the more careful choice of nuclear masses and energy densities, specially for the lighter nuclei. With these improvements the SMM is able to make quantitative determinations of isotope production. We show the application of SMM to the production of exotic nuclei through multifragmentation. These preliminary calculations demonstrate the need for a careful choice of the system size and excitation energy to attain maximum yields.

  7. Statistical mechanics of the cluster Ising model

    International Nuclear Information System (INIS)

    Smacchia, Pietro; Amico, Luigi; Facchi, Paolo; Fazio, Rosario; Florio, Giuseppe; Pascazio, Saverio; Vedral, Vlatko

    2011-01-01

    We study a Hamiltonian system describing a three-spin-1/2 clusterlike interaction competing with an Ising-like antiferromagnetic interaction. We compute free energy, spin-correlation functions, and entanglement both in the ground and in thermal states. The model undergoes a quantum phase transition between an Ising phase with a nonvanishing magnetization and a cluster phase characterized by a string order. Any two-spin entanglement is found to vanish in both quantum phases because of a nontrivial correlation pattern. Nevertheless, the residual multipartite entanglement is maximal in the cluster phase and dependent on the magnetization in the Ising phase. We study the block entropy at the critical point and calculate the central charge of the system, showing that the criticality of the system is beyond the Ising universality class.

  8. Statistical mechanics of the cluster Ising model

    Energy Technology Data Exchange (ETDEWEB)

    Smacchia, Pietro [SISSA - via Bonomea 265, I-34136, Trieste (Italy); Amico, Luigi [CNR-MATIS-IMM and Dipartimento di Fisica e Astronomia Universita di Catania, C/O ed. 10, viale Andrea Doria 6, I-95125 Catania (Italy); Facchi, Paolo [Dipartimento di Matematica and MECENAS, Universita di Bari, I-70125 Bari (Italy); INFN, Sezione di Bari, I-70126 Bari (Italy); Fazio, Rosario [NEST, Scuola Normale Superiore and Istituto Nanoscienze - CNR, 56126 Pisa (Italy); Center for Quantum Technology, National University of Singapore, 117542 Singapore (Singapore); Florio, Giuseppe; Pascazio, Saverio [Dipartimento di Fisica and MECENAS, Universita di Bari, I-70126 Bari (Italy); INFN, Sezione di Bari, I-70126 Bari (Italy); Vedral, Vlatko [Center for Quantum Technology, National University of Singapore, 117542 Singapore (Singapore); Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117542 (Singapore); Department of Physics, University of Oxford, Clarendon Laboratory, Oxford, OX1 3PU (United Kingdom)

    2011-08-15

    We study a Hamiltonian system describing a three-spin-1/2 clusterlike interaction competing with an Ising-like antiferromagnetic interaction. We compute free energy, spin-correlation functions, and entanglement both in the ground and in thermal states. The model undergoes a quantum phase transition between an Ising phase with a nonvanishing magnetization and a cluster phase characterized by a string order. Any two-spin entanglement is found to vanish in both quantum phases because of a nontrivial correlation pattern. Nevertheless, the residual multipartite entanglement is maximal in the cluster phase and dependent on the magnetization in the Ising phase. We study the block entropy at the critical point and calculate the central charge of the system, showing that the criticality of the system is beyond the Ising universality class.

  9. Functional summary statistics for the Johnson-Mehl model

    DEFF Research Database (Denmark)

    Møller, Jesper; Ghorbani, Mohammad

    The Johnson-Mehl germination-growth model is a spatio-temporal point process model which among other things have been used for the description of neurotransmitters datasets. However, for such datasets parametric Johnson-Mehl models fitted by maximum likelihood have yet not been evaluated by means...... of functional summary statistics. This paper therefore invents four functional summary statistics adapted to the Johnson-Mehl model, with two of them based on the second-order properties and the other two on the nuclei-boundary distances for the associated Johnson-Mehl tessellation. The functional summary...... statistics theoretical properties are investigated, non-parametric estimators are suggested, and their usefulness for model checking is examined in a simulation study. The functional summary statistics are also used for checking fitted parametric Johnson-Mehl models for a neurotransmitters dataset....

  10. Statistical modelling in biostatistics and bioinformatics selected papers

    CERN Document Server

    Peng, Defen

    2014-01-01

    This book presents selected papers on statistical model development related mainly to the fields of Biostatistics and Bioinformatics. The coverage of the material falls squarely into the following categories: (a) Survival analysis and multivariate survival analysis, (b) Time series and longitudinal data analysis, (c) Statistical model development and (d) Applied statistical modelling. Innovations in statistical modelling are presented throughout each of the four areas, with some intriguing new ideas on hierarchical generalized non-linear models and on frailty models with structural dispersion, just to mention two examples. The contributors include distinguished international statisticians such as Philip Hougaard, John Hinde, Il Do Ha, Roger Payne and Alessandra Durio, among others, as well as promising newcomers. Some of the contributions have come from researchers working in the BIO-SI research programme on Biostatistics and Bioinformatics, centred on the Universities of Limerick and Galway in Ireland and fu...

  11. A Statistical Evaluation of Atmosphere-Ocean General Circulation Models: Complexity vs. Simplicity

    OpenAIRE

    Robert K. Kaufmann; David I. Stern

    2004-01-01

    The principal tools used to model future climate change are General Circulation Models which are deterministic high resolution bottom-up models of the global atmosphere-ocean system that require large amounts of supercomputer time to generate results. But are these models a cost-effective way of predicting future climate change at the global level? In this paper we use modern econometric techniques to evaluate the statistical adequacy of three general circulation models (GCMs) by testing thre...

  12. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    Science.gov (United States)

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  13. Statistics

    International Nuclear Information System (INIS)

    2005-01-01

    For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees

  14. Statistical prediction of seasonal discharge in the Naryn basin for water resources planning in Central Asia

    Science.gov (United States)

    Apel, Heiko; Gafurov, Abror; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Merkushkin, Aleksandr; Merz, Bruno

    2016-04-01

    The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt water of the rivers originating in the mountains provides the only water resource available for agricultural production but also for water collection in reservoirs for energy production in winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources.. In fact, seasonal forecasts are mandatory tasks of national hydro-meteorological services in the region. Thus this study aims at a statistical forecast of the seasonal water availability, whereas the focus is put on the usage of freely available data in order to facilitate an operational use without data access limitations. The study takes the Naryn basin as a test case, at which outlet the Toktogul reservoir stores the discharge of the Naryn River. As most of the water originates form snow and glacier melt, a statistical forecast model should use data sets that can serve as proxy data for the snow masses and snow water equivalent in late spring, which essentially determines the bulk of the seasonal discharge. CRU climate data describing the precipitation and temperature in the basin during winter and spring was used as base information, which was complemented by MODIS snow cover data processed through ModSnow tool, discharge during the spring and also GRACE gravimetry anomalies. For the construction of linear forecast models monthly as well as multi-monthly means over the period January to April were used to predict the seasonal mean discharge of May-September at the station Uchterek. An automatic model selection was performed in multiple steps, whereas the best models were selected according to several performance measures and their robustness in a leave-one-out cross validation. It could be shown that the seasonal discharge can be predicted with

  15. Model predictive control using fuzzy decision functions

    NARCIS (Netherlands)

    Kaymak, U.; Costa Sousa, da J.M.

    2001-01-01

    Fuzzy predictive control integrates conventional model predictive control with techniques from fuzzy multicriteria decision making, translating the goals and the constraints to predictive control in a transparent way. The information regarding the (fuzzy) goals and the (fuzzy) constraints of the

  16. A Model of Statistics Performance Based on Achievement Goal Theory.

    Science.gov (United States)

    Bandalos, Deborah L.; Finney, Sara J.; Geske, Jenenne A.

    2003-01-01

    Tests a model of statistics performance based on achievement goal theory. Both learning and performance goals affected achievement indirectly through study strategies, self-efficacy, and test anxiety. Implications of these findings for teaching and learning statistics are discussed. (Contains 47 references, 3 tables, 3 figures, and 1 appendix.)…

  17. Kolmogorov complexity, pseudorandom generators and statistical models testing

    Czech Academy of Sciences Publication Activity Database

    Šindelář, Jan; Boček, Pavel

    2002-01-01

    Roč. 38, č. 6 (2002), s. 747-759 ISSN 0023-5954 R&D Projects: GA ČR GA102/99/1564 Institutional research plan: CEZ:AV0Z1075907 Keywords : Kolmogorov complexity * pseudorandom generators * statistical models testing Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.341, year: 2002

  18. Statistical properties of several models of fractional random point processes

    Science.gov (United States)

    Bendjaballah, C.

    2011-08-01

    Statistical properties of several models of fractional random point processes have been analyzed from the counting and time interval statistics points of view. Based on the criterion of the reduced variance, it is seen that such processes exhibit nonclassical properties. The conditions for these processes to be treated as conditional Poisson processes are examined. Numerical simulations illustrate part of the theoretical calculations.

  19. Flashover of a vacuum-insulator interface: A statistical model

    Directory of Open Access Journals (Sweden)

    W. A. Stygar

    2004-07-01

    Full Text Available We have developed a statistical model for the flashover of a 45° vacuum-insulator interface (such as would be found in an accelerator subject to a pulsed electric field. The model assumes that the initiation of a flashover plasma is a stochastic process, that the characteristic statistical component of the flashover delay time is much greater than the plasma formative time, and that the average rate at which flashovers occur is a power-law function of the instantaneous value of the electric field. Under these conditions, we find that the flashover probability is given by 1-exp(-E_{p}^{β}t_{eff}C/k^{β}, where E_{p} is the peak value in time of the spatially averaged electric field E(t, t_{eff}≡∫[E(t/E_{p}]^{β}dt is the effective pulse width, C is the insulator circumference, k∝exp(λ/d, and β and λ are constants. We define E(t as V(t/d, where V(t is the voltage across the insulator and d is the insulator thickness. Since the model assumes that flashovers occur at random azimuthal locations along the insulator, it does not apply to systems that have a significant defect, i.e., a location contaminated with debris or compromised by an imperfection at which flashovers repeatedly take place, and which prevents a random spatial distribution. The model is consistent with flashover measurements to within 7% for pulse widths between 0.5 ns and 10   μs, and to within a factor of 2 between 0.5 ns and 90 s (a span of over 11 orders of magnitude. For these measurements, E_{p} ranges from 64 to 651  kV/cm, d from 0.50 to 4.32 cm, and C from 4.96 to 95.74 cm. The model is significantly more accurate, and is valid over a wider range of parameters, than the J. C. Martin flashover relation that has been in use since 1971 [J. C. Martin on Pulsed Power, edited by T. H. Martin, A. H. Guenther, and M. Kristiansen (Plenum, New York, 1996]. We have generalized the statistical model to estimate the total-flashover probability of an

  20. Statistics

    International Nuclear Information System (INIS)

    2001-01-01

    For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  1. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  2. Statistics

    International Nuclear Information System (INIS)

    1999-01-01

    For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  3. Foreign exchange market data analysis reveals statistical features that predict price movement acceleration.

    Science.gov (United States)

    Nacher, Jose C; Ochiai, Tomoshiro

    2012-05-01

    Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.

  4. Improving statistical reasoning theoretical models and practical implications

    CERN Document Server

    Sedlmeier, Peter

    1999-01-01

    This book focuses on how statistical reasoning works and on training programs that can exploit people''s natural cognitive capabilities to improve their statistical reasoning. Training programs that take into account findings from evolutionary psychology and instructional theory are shown to have substantially larger effects that are more stable over time than previous training regimens. The theoretical implications are traced in a neural network model of human performance on statistical reasoning problems. This book apppeals to judgment and decision making researchers and other cognitive scientists, as well as to teachers of statistics and probabilistic reasoning.

  5. Statistical validation of normal tissue complication probability models

    NARCIS (Netherlands)

    Xu, Cheng-Jian; van der Schaaf, Arjen; van t Veld, Aart; Langendijk, Johannes A.; Schilstra, Cornelis

    2012-01-01

    PURPOSE: To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: A penalized regression method, LASSO (least absolute shrinkage

  6. Some remarks on the statistical model of heavy ion collisions

    International Nuclear Information System (INIS)

    Koch, V.

    2003-01-01

    This contribution is an attempt to assess what can be learned from the remarkable success of this statistical model in describing ratios of particle abundances in ultra-relativistic heavy ion collisions

  7. Eigenfunction statistics for Anderson model with Hölder continuous ...

    Indian Academy of Sciences (India)

    The Institute of Mathematical Sciences, Taramani, Chennai 600 113, India ... Anderson model; Hölder continuous measure; Poisson statistics. ...... [4] Combes J-M, Hislop P D and Klopp F, An optimal Wegner estimate and its application to.

  8. Escherichia coli bacteria density in relation to turbidity, streamflow characteristics, and season in the Chattahoochee River near Atlanta, Georgia, October 2000 through September 2008—Description, statistical analysis, and predictive modeling

    Science.gov (United States)

    Lawrence, Stephen J.

    2012-01-01

    Water-based recreation—such as rafting, canoeing, and fishing—is popular among visitors to the Chattahoochee River National Recreation Area (CRNRA) in north Georgia. The CRNRA is a 48-mile reach of the Chattahoochee River upstream from Atlanta, Georgia, managed by the National Park Service (NPS). Historically, high densities of fecal-indicator bacteria have been documented in the Chattahoochee River and its tributaries at levels that commonly exceeded Georgia water-quality standards. In October 2000, the NPS partnered with the U.S. Geological Survey (USGS), State and local agencies, and non-governmental organizations to monitor Escherichia coli bacteria (E. coli) density and develop a system to alert river users when E. coli densities exceeded the U.S. Environmental Protection Agency (USEPA) single-sample beach criterion of 235 colonies (most probable number) per 100 milliliters (MPN/100 mL) of water. This program, called BacteriALERT, monitors E. coli density, turbidity, and water temperature at two sites on the Chattahoochee River upstream from Atlanta, Georgia. This report summarizes E. coli bacteria density and turbidity values in water samples collected between 2000 and 2008 as part of the BacteriALERT program; describes the relations between E. coli density and turbidity, streamflow characteristics, and season; and describes the regression analyses used to develop predictive models that estimate E. coli density in real time at both sampling sites.

  9. Prediction of interior noise due to random acoustic or turbulent boundary layer excitation using statistical energy analysis

    Science.gov (United States)

    Grosveld, Ferdinand W.

    1990-01-01

    The feasibility of predicting interior noise due to random acoustic or turbulent boundary layer excitation was investigated in experiments in which a statistical energy analysis model (VAPEPS) was used to analyze measurements of the acceleration response and sound transmission of flat aluminum, lucite, and graphite/epoxy plates exposed to random acoustic or turbulent boundary layer excitation. The noise reduction of the plate, when backed by a shallow cavity and excited by a turbulent boundary layer, was predicted using a simplified theory based on the assumption of adiabatic compression of the fluid in the cavity. The predicted plate acceleration response was used as input in the noise reduction prediction. Reasonable agreement was found between the predictions and the measured noise reduction in the frequency range 315-1000 Hz.

  10. Ground Motion Prediction Models for Caucasus Region

    Science.gov (United States)

    Jorjiashvili, Nato; Godoladze, Tea; Tvaradze, Nino; Tumanova, Nino

    2016-04-01

    Ground motion prediction models (GMPMs) relate ground motion intensity measures to variables describing earthquake source, path, and site effects. Estimation of expected ground motion is a fundamental earthquake hazard assessment. The most commonly used parameter for attenuation relation is peak ground acceleration or spectral acceleration because this parameter gives useful information for Seismic Hazard Assessment. Since 2003 development of Georgian Digital Seismic Network has started. In this study new GMP models are obtained based on new data from Georgian seismic network and also from neighboring countries. Estimation of models is obtained by classical, statistical way, regression analysis. In this study site ground conditions are additionally considered because the same earthquake recorded at the same distance may cause different damage according to ground conditions. Empirical ground-motion prediction models (GMPMs) require adjustment to make them appropriate for site-specific scenarios. However, the process of making such adjustments remains a challenge. This work presents a holistic framework for the development of a peak ground acceleration (PGA) or spectral acceleration (SA) GMPE that is easily adjustable to different seismological conditions and does not suffer from the practical problems associated with adjustments in the response spectral domain.

  11. Prediction of Chemical Function: Model Development and ...

    Science.gov (United States)

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi

  12. A no extensive statistical model for the nucleon structure function

    International Nuclear Information System (INIS)

    Trevisan, Luis A.; Mirez, Carlos

    2013-01-01

    We studied an application of nonextensive thermodynamics to describe the structure function of nucleon, in a model where the usual Fermi-Dirac and Bose-Einstein energy distribution were replaced by the equivalent functions of the q-statistical. The parameters of the model are given by an effective temperature T, the q parameter (from Tsallis statistics), and two chemical potentials given by the corresponding up (u) and down (d) quark normalization in the nucleon.

  13. Multiple-point statistical prediction on fracture networks at Yucca Mountain

    International Nuclear Information System (INIS)

    Liu, X.Y; Zhang, C.Y.; Liu, Q.S.; Birkholzer, J.T.

    2009-01-01

    In many underground nuclear waste repository systems, such as at Yucca Mountain, water flow rate and amount of water seepage into the waste emplacement drifts are mainly determined by hydrological properties of fracture network in the surrounding rock mass. Natural fracture network system is not easy to describe, especially with respect to its connectivity which is critically important for simulating the water flow field. In this paper, we introduced a new method for fracture network description and prediction, termed multi-point-statistics (MPS). The process of the MPS method is to record multiple-point statistics concerning the connectivity patterns of a fracture network from a known fracture map, and to reproduce multiple-scale training fracture patterns in a stochastic manner, implicitly and directly. It is applied to fracture data to study flow field behavior at the Yucca Mountain waste repository system. First, the MPS method is used to create a fracture network with an original fracture training image from Yucca Mountain dataset. After we adopt a harmonic and arithmetic average method to upscale the permeability to a coarse grid, THM simulation is carried out to study near-field water flow in the surrounding waste emplacement drifts. Our study shows that connectivity or patterns of fracture networks can be grasped and reconstructed by MPS methods. In theory, it will lead to better prediction of fracture system characteristics and flow behavior. Meanwhile, we can obtain variance from flow field, which gives us a way to quantify model uncertainty even in complicated coupled THM simulations. It indicates that MPS can potentially characterize and reconstruct natural fracture networks in a fractured rock mass with advantages of quantifying connectivity of fracture system and its simulation uncertainty simultaneously.

  14. Tornadoes and related damage costs: statistical modeling with a semi-Markov approach

    OpenAIRE

    Corini, Chiara; D'Amico, Guglielmo; Petroni, Filippo; Prattico, Flavio; Manca, Raimondo

    2015-01-01

    We propose a statistical approach to tornadoes modeling for predicting and simulating occurrences of tornadoes and accumulated cost distributions over a time interval. This is achieved by modeling the tornadoes intensity, measured with the Fujita scale, as a stochastic process. Since the Fujita scale divides tornadoes intensity into six states, it is possible to model the tornadoes intensity by using Markov and semi-Markov models. We demonstrate that the semi-Markov approach is able to reprod...

  15. Statistical models and NMR analysis of polymer microstructure

    Science.gov (United States)

    Statistical models can be used in conjunction with NMR spectroscopy to study polymer microstructure and polymerization mechanisms. Thus, Bernoullian, Markovian, and enantiomorphic-site models are well known. Many additional models have been formulated over the years for additional situations. Typica...

  16. Statistics

    International Nuclear Information System (INIS)

    2003-01-01

    For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products

  17. Statistics

    International Nuclear Information System (INIS)

    2004-01-01

    For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees

  18. Statistics

    International Nuclear Information System (INIS)

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  19. What's statistical about learning? Insights from modelling statistical learning as a set of memory processes.

    Science.gov (United States)

    Thiessen, Erik D

    2017-01-05

    Statistical learning has been studied in a variety of different tasks, including word segmentation, object identification, category learning, artificial grammar learning and serial reaction time tasks (e.g. Saffran et al. 1996 Science 274: , 1926-1928; Orban et al. 2008 Proceedings of the National Academy of Sciences 105: , 2745-2750; Thiessen & Yee 2010 Child Development 81: , 1287-1303; Saffran 2002 Journal of Memory and Language 47: , 172-196; Misyak & Christiansen 2012 Language Learning 62: , 302-331). The difference among these tasks raises questions about whether they all depend on the same kinds of underlying processes and computations, or whether they are tapping into different underlying mechanisms. Prior theoretical approaches to statistical learning have often tried to explain or model learning in a single task. However, in many cases these approaches appear inadequate to explain performance in multiple tasks. For example, explaining word segmentation via the computation of sequential statistics (such as transitional probability) provides little insight into the nature of sensitivity to regularities among simultaneously presented features. In this article, we will present a formal computational approach that we believe is a good candidate to provide a unifying framework to explore and explain learning in a wide variety of statistical learning tasks. This framework suggests that statistical learning arises from a set of processes that are inherent in memory systems, including activation, interference, integration of information and forgetting (e.g. Perruchet & Vinter 1998 Journal of Memory and Language 39: , 246-263; Thiessen et al. 2013 Psychological Bulletin 139: , 792-814). From this perspective, statistical learning does not involve explicit computation of statistics, but rather the extraction of elements of the input into memory traces, and subsequent integration across those memory traces that emphasize consistent information (Thiessen and Pavlik

  20. Pseudo-dynamic source modelling with 1-point and 2-point statistics of earthquake source parameters

    KAUST Repository

    Song, S. G.

    2013-12-24

    Ground motion prediction is an essential element in seismic hazard and risk analysis. Empirical ground motion prediction approaches have been widely used in the community, but efficient simulation-based ground motion prediction methods are needed to complement empirical approaches, especially in the regions with limited data constraints. Recently, dynamic rupture modelling has been successfully adopted in physics-based source and ground motion modelling, but it is still computationally demanding and many input parameters are not well constrained by observational data. Pseudo-dynamic source modelling keeps the form of kinematic modelling with its computational efficiency, but also tries to emulate the physics of source process. In this paper, we develop a statistical framework that governs the finite-fault rupture process with 1-point and 2-point statistics of source parameters in order to quantify the variability of finite source models for future scenario events. We test this method by extracting 1-point and 2-point statistics from dynamically derived source models and simulating a number of rupture scenarios, given target 1-point and 2-point statistics. We propose a new rupture model generator for stochastic source modelling with the covariance matrix constructed from target 2-point statistics, that is, auto- and cross-correlations. Our sensitivity analysis of near-source ground motions to 1-point and 2-point statistics of source parameters provides insights into relations between statistical rupture properties and ground motions. We observe that larger standard deviation and stronger correlation produce stronger peak ground motions in general. The proposed new source modelling approach will contribute to understanding the effect of earthquake source on near-source ground motion characteristics in a more quantitative and systematic way.

  1. Aqua/Aura Updated Inclination Adjust Maneuver Performance Prediction Model

    Science.gov (United States)

    Boone, Spencer

    2017-01-01

    This presentation will discuss the updated Inclination Adjust Maneuver (IAM) performance prediction model that was developed for Aqua and Aura following the 2017 IAM series. This updated model uses statistical regression methods to identify potential long-term trends in maneuver parameters, yielding improved predictions when re-planning past maneuvers. The presentation has been reviewed and approved by Eric Moyer, ESMO Deputy Project Manager.

  2. Models for probability and statistical inference theory and applications

    CERN Document Server

    Stapleton, James H

    2007-01-01

    This concise, yet thorough, book is enhanced with simulations and graphs to build the intuition of readersModels for Probability and Statistical Inference was written over a five-year period and serves as a comprehensive treatment of the fundamentals of probability and statistical inference. With detailed theoretical coverage found throughout the book, readers acquire the fundamentals needed to advance to more specialized topics, such as sampling, linear models, design of experiments, statistical computing, survival analysis, and bootstrapping.Ideal as a textbook for a two-semester sequence on probability and statistical inference, early chapters provide coverage on probability and include discussions of: discrete models and random variables; discrete distributions including binomial, hypergeometric, geometric, and Poisson; continuous, normal, gamma, and conditional distributions; and limit theory. Since limit theory is usually the most difficult topic for readers to master, the author thoroughly discusses mo...

  3. Modelling malaria treatment practices in Bangladesh using spatial statistics

    Directory of Open Access Journals (Sweden)

    Haque Ubydul

    2012-03-01

    Full Text Available Abstract Background Malaria treatment-seeking practices vary worldwide and Bangladesh is no exception. Individuals from 88 villages in Rajasthali were asked about their treatment-seeking practices. A portion of these households preferred malaria treatment from the National Control Programme, but still a large number of households continued to use drug vendors and approximately one fourth of the individuals surveyed relied exclusively on non-control programme treatments. The risks of low-control programme usage include incomplete malaria treatment, possible misuse of anti-malarial drugs, and an increased potential for drug resistance. Methods The spatial patterns of treatment-seeking practices were first examined using hot-spot analysis (Local Getis-Ord Gi statistic and then modelled using regression. Ordinary least squares (OLS regression identified key factors explaining more than 80% of the variation in control programme and vendor treatment preferences. Geographically weighted regression (GWR was then used to assess where each factor was a strong predictor of treatment-seeking preferences. Results Several factors including tribal affiliation, housing materials, household densities, education levels, and proximity to the regional urban centre, were found to be effective predictors of malaria treatment-seeking preferences. The predictive strength of each of these factors, however, varied across the study area. While education, for example, was a strong predictor in some villages, it was less important for predicting treatment-seeking outcomes in other villages. Conclusion Understanding where each factor is a strong predictor of treatment-seeking outcomes may help in planning targeted interventions aimed at increasing control programme usage. Suggested strategies include providing additional training for the Building Resources across Communities (BRAC health workers, implementing educational programmes, and addressing economic factors.

  4. Statistical behaviour of adaptive multilevel splitting algorithms in simple models

    International Nuclear Information System (INIS)

    Rolland, Joran; Simonnet, Eric

    2015-01-01

    Adaptive multilevel splitting algorithms have been introduced rather recently for estimating tail distributions in a fast and efficient way. In particular, they can be used for computing the so-called reactive trajectories corresponding to direct transitions from one metastable state to another. The algorithm is based on successive selection–mutation steps performed on the system in a controlled way. It has two intrinsic parameters, the number of particles/trajectories and the reaction coordinate used for discriminating good or bad trajectories. We investigate first the convergence in law of the algorithm as a function of the timestep for several simple stochastic models. Second, we consider the average duration of reactive trajectories for which no theoretical predictions exist. The most important aspect of this work concerns some systems with two degrees of freedom. They are studied in detail as a function of the reaction coordinate in the asymptotic regime where the number of trajectories goes to infinity. We show that during phase transitions, the statistics of the algorithm deviate significatively from known theoretical results when using non-optimal reaction coordinates. In this case, the variance of the algorithm is peaking at the transition and the convergence of the algorithm can be much slower than the usual expected central limit behaviour. The duration of trajectories is affected as well. Moreover, reactive trajectories do not correspond to the most probable ones. Such behaviour disappears when using the optimal reaction coordinate called committor as predicted by the theory. We finally investigate a three-state Markov chain which reproduces this phenomenon and show logarithmic convergence of the trajectory durations

  5. Assessing the Transferability of Statistical Predictive Models for Leaf Area Index Between Two Airborne Discrete Return LiDAR Sensor Designs Within Multiple Intensely Managed Loblolly Pine Forest Locations in the South-Eastern USA

    Science.gov (United States)

    Sumnall, Matthew; Peduzzi, Alicia; Fox, Thomas R.; Wynne, Randolph H.; Thomas, Valerie A.; Cook, Bruce

    2016-01-01

    Leaf area is an important forest structural variable which serves as the primary means of mass and energy exchange within vegetated ecosystems. The objective of the current study was to determine if leaf area index (LAI) could be estimated accurately and consistently in five intensively managed pine plantation forests using two multiple-return airborne LiDAR datasets. Field measurements of LAI were made using the LiCOR LAI2000 and LAI2200 instruments within 116 plots were established of varying size and within a variety of stand conditions (i.e. stand age, nutrient regime and stem density) in North Carolina and Virginia in 2008 and 2013. A number of common LiDAR return height and intensity distribution metrics were calculated (e.g. average return height), in addition to ten indices, with two additional variants, utilized in the surrounding literature which have been used to estimate LAI and fractional cover, were calculated from return heights and intensity, for each plot extent. Each of the indices was assessed for correlation with each other, and was used as independent variables in linear regression analysis with field LAI as the dependent variable. All LiDAR derived metrics were also entered into a forward stepwise linear regression. The results from each of the indices varied from an R2 of 0.33 (S.E. 0.87) to 0.89 (S.E. 0.36). Those indices calculated using ratios of all returns produced the strongest correlations, such as the Above and Below Ratio Index (ABRI) and Laser Penetration Index 1 (LPI1). The regression model produced from a combination of three metrics did not improve correlations greatly (R2 0.90; S.E. 0.35). The results indicate that LAI can be predicted over a range of intensively managed pine plantation forest environments accurately when using different LiDAR sensor designs. Those indices which incorporated counts of specific return numbers (e.g. first returns) or return intensity correlated poorly with field measurements. There were

  6. Decoding β-decay systematics: A global statistical model for β- half-lives

    International Nuclear Information System (INIS)

    Costiris, N. J.; Mavrommatis, E.; Gernoth, K. A.; Clark, J. W.

    2009-01-01

    Statistical modeling of nuclear data provides a novel approach to nuclear systematics complementary to established theoretical and phenomenological approaches based on quantum theory. Continuing previous studies in which global statistical modeling is pursued within the general framework of machine learning theory, we implement advances in training algorithms designed to improve generalization, in application to the problem of reproducing and predicting the half-lives of nuclear ground states that decay 100% by the β - mode. More specifically, fully connected, multilayer feed-forward artificial neural network models are developed using the Levenberg-Marquardt optimization algorithm together with Bayesian regularization and cross-validation. The predictive performance of models emerging from extensive computer experiments is compared with that of traditional microscopic and phenomenological models as well as with the performance of other learning systems, including earlier neural network models as well as the support vector machines recently applied to the same problem. In discussing the results, emphasis is placed on predictions for nuclei that are far from the stability line, and especially those involved in r-process nucleosynthesis. It is found that the new statistical models can match or even surpass the predictive performance of conventional models for β-decay systematics and accordingly should provide a valuable additional tool for exploring the expanding nuclear landscape.

  7. Right-sizing statistical models for longitudinal data.

    Science.gov (United States)

    Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

    2015-12-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).

  8. Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

    Science.gov (United States)

    Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

    2014-01-01

    Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.

  9. Information Geometric Complexity of a Trivariate Gaussian Statistical Model

    Directory of Open Access Journals (Sweden)

    Domenico Felice

    2014-05-01

    Full Text Available We evaluate the information geometric complexity of entropic motion on low-dimensional Gaussian statistical manifolds in order to quantify how difficult it is to make macroscopic predictions about systems in the presence of limited information. Specifically, we observe that the complexity of such entropic inferences not only depends on the amount of available pieces of information but also on the manner in which such pieces are correlated. Finally, we uncover that, for certain correlational structures, the impossibility of reaching the most favorable configuration from an entropic inference viewpoint seems to lead to an information geometric analog of the well-known frustration effect that occurs in statistical physics.

  10. Prediction Model for Gastric Cancer Incidence in Korean Population.

    Directory of Open Access Journals (Sweden)

    Bang Wool Eom

    Full Text Available Predicting high risk groups for gastric cancer and motivating these groups to receive regular checkups is required for the early detection of gastric cancer. The aim of this study is was to develop a prediction model for gastric cancer incidence based on a large population-based cohort in Korea.Based on the National Health Insurance Corporation data, we analyzed 10 major risk factors for gastric cancer. The Cox proportional hazards model was used to develop gender specific prediction models for gastric cancer development, and the performance of the developed model in terms of discrimination and calibration was also validated using an independent cohort. Discrimination ability was evaluated using Harrell's C-statistics, and the calibration was evaluated using a calibration plot and slope.During a median of 11.4 years of follow-up, 19,465 (1.4% and 5,579 (0.7% newly developed gastric cancer cases were observed among 1,372,424 men and 804,077 women, respectively. The prediction models included age, BMI, family history, meal regularity, salt preference, alcohol consumption, smoking and physical activity for men, and age, BMI, family history, salt preference, alcohol consumption, and smoking for women. This prediction model showed good accuracy and predictability in both the developing and validation cohorts (C-statistics: 0.764 for men, 0.706 for women.In this study, a prediction model for gastric cancer incidence was developed that displayed a good performance.

  11. Variability aware compact model characterization for statistical circuit design optimization

    Science.gov (United States)

    Qiao, Ying; Qian, Kun; Spanos, Costas J.

    2012-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.

  12. Linear mixed models a practical guide using statistical software

    CERN Document Server

    West, Brady T; Galecki, Andrzej T

    2006-01-01

    Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo

  13. Speech emotion recognition based on statistical pitch model

    Institute of Scientific and Technical Information of China (English)

    WANG Zhiping; ZHAO Li; ZOU Cairong

    2006-01-01

    A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.

  14. Multiple commodities in statistical microeconomics: Model and market

    Science.gov (United States)

    Baaquie, Belal E.; Yu, Miao; Du, Xin

    2016-11-01

    A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.

  15. Application of Statistical Thermodynamics To Predict the Adsorption Properties of Polypeptides in Reversed-Phase HPLC.

    Science.gov (United States)

    Tarasova, Irina A; Goloborodko, Anton A; Perlova, Tatyana Y; Pridatchenko, Marina L; Gorshkov, Alexander V; Evreinov, Victor V; Ivanov, Alexander R; Gorshkov, Mikhail V

    2015-07-07

    The theory of critical chromatography for biomacromolecules (BioLCCC) describes polypeptide retention in reversed-phase HPLC using the basic principles of statistical thermodynamics. However, whether this theory correctly depicts a variety of empirical observations and laws introduced for peptide chromatography over the last decades remains to be determined. In this study, by comparing theoretical results with experimental data, we demonstrate that the BioLCCC: (1) fits the empirical dependence of the polypeptide retention on the amino acid sequence length with R(2) > 0.99 and allows in silico determination of the linear regression coefficients of the log-length correction in the additive model for arbitrary sequences and lengths and (2) predicts the distribution coefficients of polypeptides with an accuracy from 0.98 to 0.99 R(2). The latter enables direct calculation of the retention factors for given solvent compositions and modeling of the migration dynamics of polypeptides separated under isocratic or gradient conditions. The obtained results demonstrate that the suggested theory correctly relates the main aspects of polypeptide separation in reversed-phase HPLC.

  16. A classical statistical model of heavy ion collisions

    International Nuclear Information System (INIS)

    Schmidt, R.; Teichert, J.

    1980-01-01

    The use of the computer code TRAJEC which represents the numerical realization of a classical statistical model for heavy ion collisions is described. The code calculates the results of a classical friction model as well as various multi-differential cross sections for heavy ion collisions. INPUT and OUTPUT information of the code are described. Two examples of data sets are given [ru

  17. On an uncorrelated jet model with Bose-Einstein statistics

    International Nuclear Information System (INIS)

    Bilic, N.; Dadic, I.; Martinis, M.

    1978-01-01

    Starting from the density of states of an ideal Bose-Einstein gas, an uncorrelated jet model with Bose-Einstein statistics has been formulated. The transition to continuum is based on the Touschek invariant measure. It has been shown that in this model average multiplicity increases logarithmically with total energy, while the inclusive distribution shows ln s violation of scaling. (author)

  18. Statistical prediction of immunity to placental malaria based on multi-assay antibody data for malarial antigens

    DEFF Research Database (Denmark)

    Siriwardhana, Chathura; Fang, Rui; Salanti, Ali

    2017-01-01

    Background Plasmodium falciparum infections are especially severe in pregnant women because infected erythrocytes (IE) express VAR2CSA, a ligand that binds to placental trophoblasts, causing IE to accumulate in the placenta. Resulting inflammation and pathology increases a woman’s risk of anemia...... to 28 malarial antigens and used the data to develop statistical models for predicting if a woman has sufficient immunity to prevent PM. Methods Archival plasma samples from 1377 women were screened in a bead-based multiplex assay for Ab to 17 VAR2CSA-associated antigens (full length VAR2CSA (FV2), DBL...... in the following seven statistical approaches: logistic regression full model, logistic regression reduced model, recursive partitioning, random forests, linear discriminant analysis, quadratic discriminant analysis, and support vector machine. Results The best and simplest model proved to be the logistic...

  19. Complex Data Modeling and Computationally Intensive Statistical Methods

    CERN Document Server

    Mantovan, Pietro

    2010-01-01

    The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici

  20. Mixed deterministic statistical modelling of regional ozone air pollution

    KAUST Repository

    Kalenderski, Stoitchko; Steyn, Douw G.

    2011-01-01

    formalism, and explicitly accounts for advection of pollutants, using the advection equation. We apply the model to a specific case of regional ozone pollution-the Lower Fraser valley of British Columbia, Canada. As a predictive tool, we demonstrate

  1. A Statistical Approach for Gain Bandwidth Prediction of Phoenix-Cell Based Reflect arrays

    Directory of Open Access Journals (Sweden)

    Hassan Salti

    2018-01-01

    Full Text Available A new statistical approach to predict the gain bandwidth of Phoenix-cell based reflectarrays is proposed. It combines the effects of both main factors that limit the bandwidth of reflectarrays: spatial phase delays and intrinsic bandwidth of radiating cells. As an illustration, the proposed approach is successfully applied to two reflectarrays based on new Phoenix cells.

  2. Statistical Analysis of a Method to Predict Drug-Polymer Miscibility

    DEFF Research Database (Denmark)

    Knopp, Matthias Manne; Olesen, Niels Erik; Huang, Yanbin

    2016-01-01

    In this study, a method proposed to predict drug-polymer miscibility from differential scanning calorimetry measurements was subjected to statistical analysis. The method is relatively fast and inexpensive and has gained popularity as a result of the increasing interest in the formulation of drug...... as provided in this study. © 2015 Wiley Periodicals, Inc. and the American Pharmacists Association J Pharm Sci....

  3. A new method to determine the number of experimental data using statistical modeling methods

    Energy Technology Data Exchange (ETDEWEB)

    Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)

    2017-06-15

    For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.

  4. An explicit statistical model of learning lexical segmentation using multiple cues

    NARCIS (Netherlands)

    Çöltekin, Ça ̆grı; Nerbonne, John; Lenci, Alessandro; Padró, Muntsa; Poibeau, Thierry; Villavicencio, Aline

    2014-01-01

    This paper presents an unsupervised and incremental model of learning segmentation that combines multiple cues whose use by children and adults were attested by experimental studies. The cues we exploit in this study are predictability statistics, phonotactics, lexical stress and partial lexical

  5. Validation of statistical models for creep rupture by parametric analysis

    Energy Technology Data Exchange (ETDEWEB)

    Bolton, J., E-mail: john.bolton@uwclub.net [65, Fisher Ave., Rugby, Warks CV22 5HW (United Kingdom)

    2012-01-15

    Statistical analysis is an efficient method for the optimisation of any candidate mathematical model of creep rupture data, and for the comparative ranking of competing models. However, when a series of candidate models has been examined and the best of the series has been identified, there is no statistical criterion to determine whether a yet more accurate model might be devised. Hence there remains some uncertainty that the best of any series examined is sufficiently accurate to be considered reliable as a basis for extrapolation. This paper proposes that models should be validated primarily by parametric graphical comparison to rupture data and rupture gradient data. It proposes that no mathematical model should be considered reliable for extrapolation unless the visible divergence between model and data is so small as to leave no apparent scope for further reduction. This study is based on the data for a 12% Cr alloy steel used in BS PD6605:1998 to exemplify its recommended statistical analysis procedure. The models considered in this paper include a) a relatively simple model, b) the PD6605 recommended model and c) a more accurate model of somewhat greater complexity. - Highlights: Black-Right-Pointing-Pointer The paper discusses the validation of creep rupture models derived from statistical analysis. Black-Right-Pointing-Pointer It demonstrates that models can be satisfactorily validated by a visual-graphic comparison of models to data. Black-Right-Pointing-Pointer The method proposed utilises test data both as conventional rupture stress and as rupture stress gradient. Black-Right-Pointing-Pointer The approach is shown to be more reliable than a well-established and widely used method (BS PD6605).

  6. Model Prediction Control For Water Management Using Adaptive Prediction Accuracy

    NARCIS (Netherlands)

    Tian, X.; Negenborn, R.R.; Van Overloop, P.J.A.T.M.; Mostert, E.

    2014-01-01

    In the field of operational water management, Model Predictive Control (MPC) has gained popularity owing to its versatility and flexibility. The MPC controller, which takes predictions, time delay and uncertainties into account, can be designed for multi-objective management problems and for

  7. A statistical kinematic source inversion approach based on the QUESO library for uncertainty quantification and prediction

    Science.gov (United States)

    Zielke, Olaf; McDougall, Damon; Mai, Martin; Babuska, Ivo

    2014-05-01

    Seismic, often augmented with geodetic data, are frequently used to invert for the spatio-temporal evolution of slip along a rupture plane. The resulting images of the slip evolution for a single event, inferred by different research teams, often vary distinctly, depending on the adopted inversion approach and rupture model parameterization. This observation raises the question, which of the provided kinematic source inversion solutions is most reliable and most robust, and — more generally — how accurate are fault parameterization and solution predictions? These issues are not included in "standard" source inversion approaches. Here, we present a statistical inversion approach to constrain kinematic rupture parameters from teleseismic body waves. The approach is based a) on a forward-modeling scheme that computes synthetic (body-)waves for a given kinematic rupture model, and b) on the QUESO (Quantification of Uncertainty for Estimation, Simulation, and Optimization) library that uses MCMC algorithms and Bayes theorem for sample selection. We present Bayesian inversions for rupture parameters in synthetic earthquakes (i.e. for which the exact rupture history is known) in an attempt to identify the cross-over at which further model discretization (spatial and temporal resolution of the parameter space) is no longer attributed to a decreasing misfit. Identification of this cross-over is of importance as it reveals the resolution power of the studied data set (i.e. teleseismic body waves), enabling one to constrain kinematic earthquake rupture histories of real earthquakes at a resolution that is supported by data. In addition, the Bayesian approach allows for mapping complete posterior probability density functions of the desired kinematic source parameters, thus enabling us to rigorously assess the uncertainties in earthquake source inversions.

  8. Statistical Validation of Normal Tissue Complication Probability Models

    Energy Technology Data Exchange (ETDEWEB)

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Veld, Aart A. van' t; Langendijk, Johannes A. [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schilstra, Cornelis [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Radiotherapy Institute Friesland, Leeuwarden (Netherlands)

    2012-09-01

    Purpose: To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. Methods and Materials: A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Results: Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Conclusion: Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use.

  9. Statistical validation of normal tissue complication probability models.

    Science.gov (United States)

    Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis

    2012-09-01

    To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.

  10. Shell model in large spaces and statistical spectroscopy

    International Nuclear Information System (INIS)

    Kota, V.K.B.

    1996-01-01

    For many nuclear structure problems of current interest it is essential to deal with shell model in large spaces. For this, three different approaches are now in use and two of them are: (i) the conventional shell model diagonalization approach but taking into account new advances in computer technology; (ii) the shell model Monte Carlo method. A brief overview of these two methods is given. Large space shell model studies raise fundamental questions regarding the information content of the shell model spectrum of complex nuclei. This led to the third approach- the statistical spectroscopy methods. The principles of statistical spectroscopy have their basis in nuclear quantum chaos and they are described (which are substantiated by large scale shell model calculations) in some detail. (author)

  11. Computationally efficient statistical differential equation modeling using homogenization

    Science.gov (United States)

    Hooten, Mevin B.; Garlick, Martha J.; Powell, James A.

    2013-01-01

    Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.

  12. Growth Curve Models and Applications : Indian Statistical Institute

    CERN Document Server

    2017-01-01

    Growth curve models in longitudinal studies are widely used to model population size, body height, biomass, fungal growth, and other variables in the biological sciences, but these statistical methods for modeling growth curves and analyzing longitudinal data also extend to general statistics, economics, public health, demographics, epidemiology, SQC, sociology, nano-biotechnology, fluid mechanics, and other applied areas.   There is no one-size-fits-all approach to growth measurement. The selected papers in this volume build on presentations from the GCM workshop held at the Indian Statistical Institute, Giridih, on March 28-29, 2016. They represent recent trends in GCM research on different subject areas, both theoretical and applied. This book includes tools and possibilities for further work through new techniques and modification of existing ones. The volume includes original studies, theoretical findings and case studies from a wide range of app lied work, and these contributions have been externally r...

  13. Statistical emulation of a tsunami model for sensitivity analysis and uncertainty quantification

    Directory of Open Access Journals (Sweden)

    A. Sarri

    2012-06-01

    Full Text Available Due to the catastrophic consequences of tsunamis, early warnings need to be issued quickly in order to mitigate the hazard. Additionally, there is a need to represent the uncertainty in the predictions of tsunami characteristics corresponding to the uncertain trigger features (e.g. either position, shape and speed of a landslide, or sea floor deformation associated with an earthquake. Unfortunately, computer models are expensive to run. This leads to significant delays in predictions and makes the uncertainty quantification impractical. Statistical emulators run almost instantaneously and may represent well the outputs of the computer model. In this paper, we use the outer product emulator to build a fast statistical surrogate of a landslide-generated tsunami computer model. This Bayesian framework enables us to build the emulator by combining prior knowledge of the computer model properties with a few carefully chosen model evaluations. The good performance of the emulator is validated using the leave-one-out method.

  14. Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

    Science.gov (United States)

    Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.

    2015-01-01

    We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167

  15. Statistical modelling for recurrent events: an application to sports injuries.

    Science.gov (United States)

    Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F

    2014-09-01

    Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  16. Automated parameter estimation for biological models using Bayesian statistical model checking.

    Science.gov (United States)

    Hussain, Faraz; Langmead, Christopher J; Mi, Qi; Dutta-Moscato, Joyeeta; Vodovotz, Yoram; Jha, Sumit K

    2015-01-01

    Probabilistic models have gained widespread acceptance in the systems biology community as a useful way to represent complex biological systems. Such models are developed using existing knowledge of the structure and dynamics of the system, experimental observations, and inferences drawn from statistical analysis of empirical data. A key bottleneck in building such models is that some system variables cannot be measured experimentally. These variables are incorporated into the model as numerical parameters. Determining values of these parameters that justify existing experiments and provide reliable predictions when model simulations are performed is a key research problem. Using an agent-based model of the dynamics of acute inflammation, we demonstrate a novel parameter estimation algorithm by discovering the amount and schedule of doses of bacterial lipopolysaccharide that guarantee a set of observed clinical outcomes with high probability. We synthesized values of twenty-eight unknown parameters such that the parameterized model instantiated with these parameter values satisfies four specifications describing the dynamic behavior of the model. We have developed a new algorithmic technique for discovering parameters in complex stochastic models of biological systems given behavioral specifications written in a formal mathematical logic. Our algorithm uses Bayesian model checking, sequential hypothesis testing, and stochastic optimization to automatically synthesize parameters of probabilistic biological models.

  17. Statistical modelling of railway track geometry degradation using Hierarchical Bayesian models

    International Nuclear Information System (INIS)

    Andrade, A.R.; Teixeira, P.F.

    2015-01-01

    Railway maintenance planners require a predictive model that can assess the railway track geometry degradation. The present paper uses a Hierarchical Bayesian model as a tool to model the main two quality indicators related to railway track geometry degradation: the standard deviation of longitudinal level defects and the standard deviation of horizontal alignment defects. Hierarchical Bayesian Models (HBM) are flexible statistical models that allow specifying different spatially correlated components between consecutive track sections, namely for the deterioration rates and the initial qualities parameters. HBM are developed for both quality indicators, conducting an extensive comparison between candidate models and a sensitivity analysis on prior distributions. HBM is applied to provide an overall assessment of the degradation of railway track geometry, for the main Portuguese railway line Lisbon–Oporto. - Highlights: • Rail track geometry degradation is analysed using Hierarchical Bayesian models. • A Gibbs sampling strategy is put forward to estimate the HBM. • Model comparison and sensitivity analysis find the most suitable model. • We applied the most suitable model to all the segments of the main Portuguese line. • Tackling spatial correlations using CAR structures lead to a better model fit

  18. Statistical Model of the 2001 Czech Census for Interactive Presentation

    Czech Academy of Sciences Publication Activity Database

    Grim, Jiří; Hora, Jan; Boček, Pavel; Somol, Petr; Pudil, Pavel

    Vol. 26, č. 4 (2010), s. 1-23 ISSN 0282-423X R&D Projects: GA ČR GA102/07/1594; GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Interactive statistical model * census data presentation * distribution mixtures * data modeling * EM algorithm * incomplete data * data reproduction accuracy * data mining Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.492, year: 2010 http://library.utia.cas.cz/separaty/2010/RO/grim-0350513.pdf

  19. The Statistical Modeling of the Trends Concerning the Romanian Population

    Directory of Open Access Journals (Sweden)

    Gabriela OPAIT

    2014-11-01

    Full Text Available This paper reflects the statistical modeling concerning the resident population in Romania, respectively the total of the romanian population, through by means of the „Least Squares Method”. Any country it develops by increasing of the population, respectively of the workforce, which is a factor of influence for the growth of the Gross Domestic Product (G.D.P.. The „Least Squares Method” represents a statistical technique for to determine the trend line of the best fit concerning a model.

  20. Data driven propulsion system weight prediction model

    Science.gov (United States)

    Gerth, Richard J.

    1994-10-01

    The objective of the research was to develop a method to predict the weight of paper engines, i.e., engines that are in the early stages of development. The impetus for the project was the Single Stage To Orbit (SSTO) project, where engineers need to evaluate alternative engine designs. Since the SSTO is a performance driven project the performance models for alternative designs were well understood. The next tradeoff is weight. Since it is known that engine weight varies with thrust levels, a model is required that would allow discrimination between engines that produce the same thrust. Above all, the model had to be rooted in data with assumptions that could be justified based on the data. The general approach was to collect data on as many existing engines as possible and build a statistical model of the engines weight as a function of various component performance parameters. This was considered a reasonable level to begin the project because the data would be readily available, and it would be at the level of most paper engines, prior to detailed component design.

  1. Iowa calibration of MEPDG performance prediction models.

    Science.gov (United States)

    2013-06-01

    This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement : performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 : representative p...

  2. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    Science.gov (United States)

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable

  3. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    Science.gov (United States)

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  4. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    Science.gov (United States)

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  5. Analyzing sickness absence with statistical models for survival data

    DEFF Research Database (Denmark)

    Christensen, Karl Bang; Andersen, Per Kragh; Smith-Hansen, Lars

    2007-01-01

    OBJECTIVES: Sickness absence is the outcome in many epidemiologic studies and is often based on summary measures such as the number of sickness absences per year. In this study the use of modern statistical methods was examined by making better use of the available information. Since sickness...... absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...

  6. Model complexity control for hydrologic prediction

    NARCIS (Netherlands)

    Schoups, G.; Van de Giesen, N.C.; Savenije, H.H.G.

    2008-01-01

    A common concern in hydrologic modeling is overparameterization of complex models given limited and noisy data. This leads to problems of parameter nonuniqueness and equifinality, which may negatively affect prediction uncertainties. A systematic way of controlling model complexity is therefore

  7. A Review of Modeling Bioelectrochemical Systems: Engineering and Statistical Aspects

    Directory of Open Access Journals (Sweden)

    Shuai Luo

    2016-02-01

    Full Text Available Bioelectrochemical systems (BES are promising technologies to convert organic compounds in wastewater to electrical energy through a series of complex physical-chemical, biological and electrochemical processes. Representative BES such as microbial fuel cells (MFCs have been studied and advanced for energy recovery. Substantial experimental and modeling efforts have been made for investigating the processes involved in electricity generation toward the improvement of the BES performance for practical applications. However, there are many parameters that will potentially affect these processes, thereby making the optimization of system performance hard to be achieved. Mathematical models, including engineering models and statistical models, are powerful tools to help understand the interactions among the parameters in BES and perform optimization of BES configuration/operation. This review paper aims to introduce and discuss the recent developments of BES modeling from engineering and statistical aspects, including analysis on the model structure, description of application cases and sensitivity analysis of various parameters. It is expected to serves as a compass for integrating the engineering and statistical modeling strategies to improve model accuracy for BES development.

  8. Translating visual information into action predictions: Statistical learning in action and nonaction contexts.

    Science.gov (United States)

    Monroy, Claire D; Gerson, Sarah A; Hunnius, Sabine

    2018-05-01

    Humans are sensitive to the statistical regularities in action sequences carried out by others. In the present eyetracking study, we investigated whether this sensitivity can support the prediction of upcoming actions when observing unfamiliar action sequences. In two between-subjects conditions, we examined whether observers would be more sensitive to statistical regularities in sequences performed by a human agent versus self-propelled 'ghost' events. Secondly, we investigated whether regularities are learned better when they are associated with contingent effects. Both implicit and explicit measures of learning were compared between agent and ghost conditions. Implicit learning was measured via predictive eye movements to upcoming actions or events, and explicit learning was measured via both uninstructed reproduction of the action sequences and verbal reports of the regularities. The findings revealed that participants, regardless of condition, readily learned the regularities and made correct predictive eye movements to upcoming events during online observation. However, different patterns of explicit-learning outcomes emerged following observation: Participants were most likely to re-create the sequence regularities and to verbally report them when they had observed an actor create a contingent effect. These results suggest that the shift from implicit predictions to explicit knowledge of what has been learned is facilitated when observers perceive another agent's actions and when these actions cause effects. These findings are discussed with respect to the potential role of the motor system in modulating how statistical regularities are learned and used to modify behavior.

  9. New robust statistical procedures for the polytomous logistic regression models.

    Science.gov (United States)

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  10. Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

    Science.gov (United States)

    Müller, M. F.; Thompson, S. E.

    2016-02-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

  11. EFFECT OF MEASUREMENT ERRORS ON PREDICTED COSMOLOGICAL CONSTRAINTS FROM SHEAR PEAK STATISTICS WITH LARGE SYNOPTIC SURVEY TELESCOPE

    Energy Technology Data Exchange (ETDEWEB)

    Bard, D.; Chang, C.; Kahn, S. M.; Gilmore, K.; Marshall, S. [KIPAC, Stanford University, 452 Lomita Mall, Stanford, CA 94309 (United States); Kratochvil, J. M.; Huffenberger, K. M. [Department of Physics, University of Miami, Coral Gables, FL 33124 (United States); May, M. [Physics Department, Brookhaven National Laboratory, Upton, NY 11973 (United States); AlSayyad, Y.; Connolly, A.; Gibson, R. R.; Jones, L.; Krughoff, S. [Department of Astronomy, University of Washington, Seattle, WA 98195 (United States); Ahmad, Z.; Bankert, J.; Grace, E.; Hannel, M.; Lorenz, S. [Department of Physics, Purdue University, West Lafayette, IN 47907 (United States); Haiman, Z.; Jernigan, J. G., E-mail: djbard@slac.stanford.edu [Department of Astronomy and Astrophysics, Columbia University, New York, NY 10027 (United States); and others

    2013-09-01

    We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST Image Simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.

  12. Simple classical model for Fano statistics in radiation detectors

    Energy Technology Data Exchange (ETDEWEB)

    Jordan, David V. [Pacific Northwest National Laboratory, National Security Division - Radiological and Chemical Sciences Group PO Box 999, Richland, WA 99352 (United States)], E-mail: David.Jordan@pnl.gov; Renholds, Andrea S.; Jaffe, John E.; Anderson, Kevin K.; Rene Corrales, L.; Peurrung, Anthony J. [Pacific Northwest National Laboratory, National Security Division - Radiological and Chemical Sciences Group PO Box 999, Richland, WA 99352 (United States)

    2008-02-01

    A simple classical model that captures the essential statistics of energy partitioning processes involved in the creation of information carriers (ICs) in radiation detectors is presented. The model pictures IC formation from a fixed amount of deposited energy in terms of the statistically analogous process of successively sampling water from a large, finite-volume container ('bathtub') with a small dipping implement ('shot or whiskey glass'). The model exhibits sub-Poisson variance in the distribution of the number of ICs generated (the 'Fano effect'). Elementary statistical analysis of the model clarifies the role of energy conservation in producing the Fano effect and yields Fano's prescription for computing the relative variance of the IC number distribution in terms of the mean and variance of the underlying, single-IC energy distribution. The partitioning model is applied to the development of the impact ionization cascade in semiconductor radiation detectors. It is shown that, in tandem with simple assumptions regarding the distribution of energies required to create an (electron, hole) pair, the model yields an energy-independent Fano factor of 0.083, in accord with the lower end of the range of literature values reported for silicon and high-purity germanium. The utility of this simple picture as a diagnostic tool for guiding or constraining more detailed, 'microscopic' physical models of detector material response to ionizing radiation is discussed.

  13. Development of 3D statistical mandible models for cephalometric measurements

    International Nuclear Information System (INIS)

    Kim, Sung Goo; Yi, Won Jin; Hwang, Soon Jung; Choi, Soon Chul; Lee, Sam Sun; Heo, Min Suk; Huh, Kyung Hoe; Kim, Tae Il; Hong, Helen; Yoo, Ji Hyun

    2012-01-01

    The aim of this study was to provide sex-matched three-dimensional (3D) statistical shape models of the mandible, which would provide cephalometric parameters for 3D treatment planning and cephalometric measurements in orthognathic surgery. The subjects used to create the 3D shape models of the mandible included 23 males and 23 females. The mandibles were segmented semi-automatically from 3D facial CT images. Each individual mandible shape was reconstructed as a 3D surface model, which was parameterized to establish correspondence between different individual surfaces. The principal component analysis (PCA) applied to all mandible shapes produced a mean model and characteristic models of variation. The cephalometric parameters were measured directly from the mean models to evaluate the 3D shape models. The means of the measured parameters were compared with those from other conventional studies. The male and female 3D statistical mean models were developed from 23 individual mandibles, respectively. The male and female characteristic shapes of variation produced by PCA showed a large variability included in the individual mandibles. The cephalometric measurements from the developed models were very close to those from some conventional studies. We described the construction of 3D mandibular shape models and presented the application of the 3D mandibular template in cephalometric measurements. Optimal reference models determined from variations produced by PCA could be used for craniofacial patients with various types of skeletal shape.

  14. Development of 3D statistical mandible models for cephalometric measurements

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung Goo; Yi, Won Jin; Hwang, Soon Jung; Choi, Soon Chul; Lee, Sam Sun; Heo, Min Suk; Huh, Kyung Hoe; Kim, Tae Il [School of Dentistry, Seoul National University, Seoul (Korea, Republic of); Hong, Helen; Yoo, Ji Hyun [Division of Multimedia Engineering, Seoul Women' s University, Seoul (Korea, Republic of)

    2012-09-15

    The aim of this study was to provide sex-matched three-dimensional (3D) statistical shape models of the mandible, which would provide cephalometric parameters for 3D treatment planning and cephalometric measurements in orthognathic surgery. The subjects used to create the 3D shape models of the mandible included 23 males and 23 females. The mandibles were segmented semi-automatically from 3D facial CT images. Each individual mandible shape was reconstructed as a 3D surface model, which was parameterized to establish correspondence between different individual surfaces. The principal component analysis (PCA) applied to all mandible shapes produced a mean model and characteristic models of variation. The cephalometric parameters were measured directly from the mean models to evaluate the 3D shape models. The means of the measured parameters were compared with those from other conventional studies. The male and female 3D statistical mean models were developed from 23 individual mandibles, respectively. The male and female characteristic shapes of variation produced by PCA showed a large variability included in the individual mandibles. The cephalometric measurements from the developed models were very close to those from some conventional studies. We described the construction of 3D mandibular shape models and presented the application of the 3D mandibular template in cephalometric measurements. Optimal reference models determined from variations produced by PCA could be used for craniofacial patients with various types of skeletal shape.

  15. Nonlinear chaotic model for predicting storm surges

    Directory of Open Access Journals (Sweden)

    M. Siek

    2010-09-01

    Full Text Available This paper addresses the use of the methods of nonlinear dynamics and chaos theory for building a predictive chaotic model from time series. The chaotic model predictions are made by the adaptive local models based on the dynamical neighbors found in the reconstructed phase space of the observables. We implemented the univariate and multivariate chaotic models with direct and multi-steps prediction techniques and optimized these models using an exhaustive search method. The built models were tested for predicting storm surge dynamics for different stormy conditions in the North Sea, and are compared to neural network models. The results show that the chaotic models can generally provide reliable and accurate short-term storm surge predictions.

  16. Staying Power of Churn Prediction Models

    NARCIS (Netherlands)

    Risselada, Hans; Verhoef, Peter C.; Bijmolt, Tammo H. A.

    In this paper, we study the staying power of various churn prediction models. Staying power is defined as the predictive performance of a model in a number of periods after the estimation period. We examine two methods, logit models and classification trees, both with and without applying a bagging

  17. Predictive user modeling with actionable attributes

    NARCIS (Netherlands)

    Zliobaite, I.; Pechenizkiy, M.

    2013-01-01

    Different machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target

  18. Statistical sampling and modelling for cork oak and eucalyptus stands

    NARCIS (Netherlands)

    Paulo, M.J.

    2002-01-01

    This thesis focuses on the use of modern statistical methods to solve problems on sampling, optimal cutting time and agricultural modelling in Portuguese cork oak and eucalyptus stands. The results are contained in five chapters that have been submitted for publication

  19. Two-dimensional models in statistical mechanics and field theory

    International Nuclear Information System (INIS)

    Koberle, R.

    1980-01-01

    Several features of two-dimensional models in statistical mechanics and Field theory, such as, lattice quantum chromodynamics, Z(N), Gross-Neveu and CP N-1 are discussed. The problems of confinement and dynamical mass generation are also analyzed. (L.C.) [pt

  20. Model selection for contingency tables with algebraic statistics

    NARCIS (Netherlands)

    Krampe, A.; Kuhnt, S.; Gibilisco, P.; Riccimagno, E.; Rogantin, M.P.; Wynn, H.P.

    2009-01-01

    Goodness-of-fit tests based on chi-square approximations are commonly used in the analysis of contingency tables. Results from algebraic statistics combined with MCMC methods provide alternatives to the chi-square approximation. However, within a model selection procedure usually a large number of